-
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2404.02060
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2 -
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 2 -
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Paper • 2205.05055 • Published • 2 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 24 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 67 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 72
-
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 19 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 18 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22 -
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Paper • 2403.09347 • Published • 20
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 18 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 24 -
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper • 2402.15000 • Published • 22
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 134 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18
-
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Paper • 2306.00107 • Published • 3 -
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Paper • 2309.08730 • Published • 1 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 56 -
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
Paper • 2401.11944 • Published • 26
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64