Tanvir1337
's Collections
Papers
updated
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
603
Paper
•
2401.04088
•
Published
•
159
Paper
•
2310.06825
•
Published
•
47
Don't Make Your LLM an Evaluation Benchmark Cheater
Paper
•
2311.01964
•
Published
•
1
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure
Synthetic Data
Paper
•
2107.10833
•
Published
•
2
Pretraining on the Test Set Is All You Need
Paper
•
2309.08632
•
Published
•
3
Petals: Collaborative Inference and Fine-tuning of Large Models
Paper
•
2209.01188
•
Published
•
2
MemGPT: Towards LLMs as Operating Systems
Paper
•
2310.08560
•
Published
•
7
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Paper
•
2310.08528
•
Published
•
3
Not All Attention Is All You Need
Paper
•
2104.04692
•
Published
•
2
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Paper
•
2309.12288
•
Published
•
3
Exploring the MIT Mathematics and EECS Curriculum Using Large Language
Models
Paper
•
2306.08997
•
Published
•
10
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper
•
2309.06126
•
Published
•
16
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
242
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN
Fine-Tuning
Paper
•
2307.02053
•
Published
•
23
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
44
RepoFusion: Training Code Models to Understand Your Repository
Paper
•
2306.10998
•
Published
•
14
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
•
2205.14135
•
Published
•
11
Effective Long-Context Scaling of Foundation Models
Paper
•
2309.16039
•
Published
•
30
The Internal State of an LLM Knows When its Lying
Paper
•
2304.13734
•
Published
•
3
GPT-Fathom: Benchmarking Large Language Models to Decipher the
Evolutionary Path towards GPT-4 and Beyond
Paper
•
2309.16583
•
Published
•
13
Retentive Network: A Successor to Transformer for Large Language Models
Paper
•
2307.08621
•
Published
•
170
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
48
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper
•
2310.09263
•
Published
•
39
Llemma: An Open Language Model For Mathematics
Paper
•
2310.10631
•
Published
•
50
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Paper
•
2309.17421
•
Published
•
4
Take a Step Back: Evoking Reasoning via Abstraction in Large Language
Models
Paper
•
2310.06117
•
Published
•
3
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation
via Large Language Models
Paper
•
2309.10730
•
Published
•
2
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Paper
•
2310.13289
•
Published
•
17
Scaling up GANs for Text-to-Image Synthesis
Paper
•
2303.05511
•
Published
•
4
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
Paper
•
2308.09713
•
Published
•
2
Text-to-3D using Gaussian Splatting
Paper
•
2309.16585
•
Published
•
31
HyperHuman: Hyper-Realistic Human Generation with Latent Structural
Diffusion
Paper
•
2310.08579
•
Published
•
14
MVDream: Multi-view Diffusion for 3D Generation
Paper
•
2308.16512
•
Published
•
102
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with
Point Cloud Priors
Paper
•
2310.08529
•
Published
•
17
AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image
Collections
Paper
•
2309.02186
•
Published
•
21
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic
Image Design and Generation
Paper
•
2310.08541
•
Published
•
17
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video
Generation
Paper
•
2309.15818
•
Published
•
19
PixArt-α: Fast Training of Diffusion Transformer for
Photorealistic Text-to-Image Synthesis
Paper
•
2310.00426
•
Published
•
61
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Paper
•
2301.11757
•
Published
•
3
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
•
2310.12962
•
Published
•
14
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
Pruning
Paper
•
2310.06694
•
Published
•
4
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
•
2310.17680
•
Published
•
69
Multimodal ChatGPT for Medical Applications: an Experimental Study of
GPT-4V
Paper
•
2310.19061
•
Published
•
8
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language
Modeling Likewise
Paper
•
2310.19019
•
Published
•
9
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
•
2309.12284
•
Published
•
18
Does GPT-4 Pass the Turing Test?
Paper
•
2310.20216
•
Published
•
17
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper
•
2311.00176
•
Published
•
8
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper
•
2311.00272
•
Published
•
9
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
•
2311.00571
•
Published
•
40
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
Paper
•
2307.04725
•
Published
•
64
Improved Baselines with Visual Instruction Tuning
Paper
•
2310.03744
•
Published
•
37
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper
•
2005.11401
•
Published
•
12
Levels of AGI: Operationalizing Progress on the Path to AGI
Paper
•
2311.02462
•
Published
•
33
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
Large Language Models for Code Generation
Paper
•
2305.01210
•
Published
•
4
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and
reusing ModulEs
Paper
•
2311.04901
•
Published
•
7
GPT4All: An Ecosystem of Open Source Compressed Language Models
Paper
•
2311.04931
•
Published
•
20
CogVLM: Visual Expert for Pretrained Language Models
Paper
•
2311.03079
•
Published
•
23
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
•
2311.03285
•
Published
•
28
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
•
2311.05437
•
Published
•
47
Can ChatGPT Assess Human Personalities? A General Evaluation Framework
Paper
•
2303.01248
•
Published
•
1
Deep Unlearning via Randomized Conditionally Independent Hessians
Paper
•
2204.07655
•
Published
•
1
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
122
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper
•
2309.11235
•
Published
•
16
Learning Temporal Coherence via Self-Supervision for GAN-based Video
Generation
Paper
•
1811.09393
•
Published
•
1
SDXL: Improving Latent Diffusion Models for High-Resolution Image
Synthesis
Paper
•
2307.01952
•
Published
•
82
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution
Paper
•
2306.15794
•
Published
•
17
ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical
Domain Knowledge
Paper
•
2303.14070
•
Published
•
11
Orca 2: Teaching Small Language Models How to Reason
Paper
•
2311.11045
•
Published
•
70
Video-LLaVA: Learning United Visual Representation by Alignment Before
Projection
Paper
•
2311.10122
•
Published
•
26
GAIA: a benchmark for General AI Assistants
Paper
•
2311.12983
•
Published
•
184
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
•
2311.13600
•
Published
•
42
Scalable Extraction of Training Data from (Production) Language Models
Paper
•
2311.17035
•
Published
•
4
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI
Feedback
Paper
•
2309.00267
•
Published
•
47
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Paper
•
2311.16079
•
Published
•
20
Magicoder: Source Code Is All You Need
Paper
•
2312.02120
•
Published
•
79
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Paper
•
2312.02119
•
Published
•
1
Hyena Hierarchy: Towards Larger Convolutional Language Models
Paper
•
2302.10866
•
Published
•
7
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper
•
2312.04724
•
Published
•
20
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
•
2101.03961
•
Published
•
14
VILA: On Pre-training for Visual Language Models
Paper
•
2312.07533
•
Published
•
20
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
•
2312.06550
•
Published
•
57
VecFusion: Vector Font Generation with Diffusion
Paper
•
2312.10540
•
Published
•
21
Silkie: Preference Distillation for Large Visual Language Models
Paper
•
2312.10665
•
Published
•
11
Osprey: Pixel Understanding with Visual Instruction Tuning
Paper
•
2312.10032
•
Published
•
4
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
45
Holistic Evaluation of Language Models
Paper
•
2211.09110
•
Published
•
1
Lost in Translation: A Study of Bugs Introduced by Large Language Models
while Translating Code
Paper
•
2308.03109
•
Published
•
1
StarCoder: may the source be with you!
Paper
•
2305.06161
•
Published
•
30
LM-Cocktail: Resilient Tuning of Language Models via Model Merging
Paper
•
2311.13534
•
Published
•
3
Principled Instructions Are All You Need for Questioning LLaMA-1/2,
GPT-3.5/4
Paper
•
2312.16171
•
Published
•
34
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
181
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
•
2401.02954
•
Published
•
41
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper
•
2401.02415
•
Published
•
53
Fast Conformer with Linearly Scalable Attention for Efficient Speech
Recognition
Paper
•
2305.05084
•
Published
•
1
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
•
2401.01325
•
Published
•
26
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by
Few-Shot Grounding on Wikipedia
Paper
•
2305.14292
•
Published
•
1
I am a Strange Dataset: Metalinguistic Tests for Language Models
Paper
•
2401.05300
•
Published
•
4
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
144
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices
Paper
•
2311.16567
•
Published
•
22
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for
Instruction Fine-Tuning
Paper
•
2402.04833
•
Published
•
6
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts
Models
Paper
•
2402.07033
•
Published
•
16
GraphCast: Learning skillful medium-range global weather forecasting
Paper
•
2212.12794
•
Published
•
1
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper
•
2402.10176
•
Published
•
35
GES: Generalized Exponential Splatting for Efficient Radiance Field
Rendering
Paper
•
2402.10128
•
Published
•
16
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Paper
•
2402.13929
•
Published
•
28
Neural Circuit Diagrams: Robust Diagrams for the Communication,
Implementation, and Analysis of Deep Learning Architectures
Paper
•
2402.05424
•
Published
•
17
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
•
2402.17177
•
Published
•
88
Adapting Large Language Models via Reading Comprehension
Paper
•
2309.09530
•
Published
•
77
Grandmaster-Level Chess Without Search
Paper
•
2402.04494
•
Published
•
67
TransformerFAM: Feedback attention is working memory
Paper
•
2404.09173
•
Published
•
43
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
Generation
Paper
•
2403.16990
•
Published
•
25
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper
•
2404.10667
•
Published
•
17
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
103
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper
•
2405.11473
•
Published
•
53
Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models
Paper
•
2311.08692
•
Published
•
12
DataComp-LM: In search of the next generation of training sets for
language models
Paper
•
2406.11794
•
Published
•
49
What If We Recaption Billions of Web Images with LLaMA-3?
Paper
•
2406.08478
•
Published
•
39
De-DSI: Decentralised Differentiable Search Index
Paper
•
2404.12237
•
Published
•
2
G-Rank: Unsupervised Continuous Learn-to-Rank for Edge Devices in a P2P
Network
Paper
•
2301.12530
•
Published
•
1
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper
•
2406.20094
•
Published
•
95
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal
Dataset with One Trillion Tokens
Paper
•
2406.11271
•
Published
•
20
The Carbon Footprint of Machine Learning Training Will Plateau, Then
Shrink
Paper
•
2204.05149
•
Published
•
7
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
•
2406.08464
•
Published
•
65
Robust Speech Recognition via Large-Scale Weak Supervision
Paper
•
2212.04356
•
Published
•
23
Qwen2-Audio Technical Report
Paper
•
2407.10759
•
Published
•
55
Qwen2.5-Coder Technical Report
Paper
•
2409.12186
•
Published
•
136
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
•
2410.02884
•
Published
•
50
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126