Applied Machine Learning Papers - a VikramSingh178 Collection

VikramSingh178 's Collections

Applied Machine Learning Papers

Applied Machine Learning Papers

updated Dec 18, 2024

Reading List (Mainly Focused of VLM's and Diffusion Models)

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 18
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 13
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 11
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 9
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 3
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Paper • 2112.10741 • Published Dec 20, 2021 • 3
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 22
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 83
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
PonderNet: Learning to Ponder

Paper • 2107.05407 • Published Jul 12, 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Paper • 2106.10270 • Published Jun 18, 2021 • 3
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

Paper • 2403.07500 • Published Mar 12, 2024
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

Paper • 2305.14720 • Published May 24, 2023 • 2
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 78
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published May 27, 2024 • 10
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30, 2024 • 35
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Paper • 2301.08243 • Published Jan 19, 2023 • 6
Revisiting Feature Prediction for Learning Visual Representations from Video

Paper • 2404.08471 • Published Feb 15, 2024 • 1
Guiding Instruction-based Image Editing via Multimodal Large Language Models

Paper • 2309.17102 • Published Sep 29, 2023 • 3
SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21, 2024 • 27
Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4, 2024 • 16
I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4, 2024 • 17
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11, 2024 • 31
Graph Neural Networks Gone Hogwild

Paper • 2407.00494 • Published Jun 29, 2024
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Paper • 2211.14758 • Published Nov 27, 2022 • 1
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 26
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Paper • 2403.12008 • Published Mar 18, 2024 • 20
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

Paper • 2310.11513 • Published Oct 17, 2023 • 1
InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 17
Semi-Parametric Neural Image Synthesis

Paper • 2204.11824 • Published Apr 25, 2022 • 1
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2, 2024 • 54
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 16
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Paper • 2310.12190 • Published Oct 18, 2023 • 10
PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11, 2024 • 49
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23, 2024 • 21
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

Paper • 2408.03209 • Published Aug 6, 2024 • 22
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Paper • 2312.10300 • Published Dec 16, 2023 • 1
Colorful Diffuse Intrinsic Image Decomposition in the Wild

Paper • 2409.13690 • Published Sep 20, 2024 • 13
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Paper • 2410.10629 • Published Oct 14, 2024 • 11
Large Language Models Reflect the Ideology of their Creators

Paper • 2410.18417 • Published Oct 24, 2024
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Paper • 2410.19355 • Published Oct 25, 2024 • 23
How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published Nov 4, 2024 • 33
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Paper • 2411.04997 • Published Nov 7, 2024 • 37
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published Nov 11, 2024 • 28
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139
BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 33