The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 573
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6 • 9
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 15
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution Paper • 2401.00935 • Published Jan 1 • 16
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Paper • 2311.12454 • Published Nov 21, 2023 • 27
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction Paper • 2311.12024 • Published Nov 20, 2023 • 16
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Paper • 2311.12631 • Published Nov 21, 2023 • 12
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression Paper • 2311.10794 • Published Nov 17, 2023 • 22
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning Paper • 2311.10709 • Published Nov 17, 2023 • 24
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework Paper • 2311.10125 • Published Nov 16, 2023 • 4
SoundCam: A Dataset for Finding Humans Using Room Acoustics Paper • 2311.03517 • Published Nov 6, 2023 • 9
Levels of AGI: Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 30
The Generative AI Paradox: "What It Can Create, It May Not Understand" Paper • 2311.00059 • Published Oct 31, 2023 • 17
SALMONN: Towards Generic Hearing Abilities for Large Language Models Paper • 2310.13289 • Published Oct 20, 2023 • 16
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models Paper • 2310.08659 • Published Oct 12, 2023 • 20
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Paper • 2310.09478 • Published Oct 14, 2023 • 17
DECO: Dense Estimation of 3D Human-Scene Contact In The Wild Paper • 2309.15273 • Published Sep 26, 2023 • 7
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models Paper • 2309.16414 • Published Sep 28, 2023 • 19
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Paper • 2310.00426 • Published Sep 30, 2023 • 60
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning Paper • 2309.07915 • Published Sep 14, 2023 • 4
AstroLLaMA: Towards Specialized Foundation Models in Astronomy Paper • 2309.06126 • Published Sep 12, 2023 • 16
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 84
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting Paper • 2309.04269 • Published Sep 8, 2023 • 29
FACET: Fairness in Computer Vision Evaluation Benchmark Paper • 2309.00035 • Published Aug 31, 2023 • 13
Nougat: Neural Optical Understanding for Academic Documents Paper • 2308.13418 • Published Aug 25, 2023 • 33
Relightable and Animatable Neural Avatar from Sparse-View Video Paper • 2308.07903 • Published Aug 15, 2023 • 9
Teach LLMs to Personalize -- An Approach inspired by Writing Education Paper • 2308.07968 • Published Aug 15, 2023 • 24