Collections
Discover the best community collections!
Collections including paper arxiv:2405.01535
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 53 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 32 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 6 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 68
-
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 28 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Humans or LLMs as the Judge? A Study on Judgement Biases
Paper • 2402.10669 • Published -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 32
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 60 -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 114
-
Latxa: An Open Language Model and Evaluation Suite for Basque
Paper • 2403.20266 • Published • 3 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 63 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 114 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 27
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 43 -
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper • 2404.03566 • Published • 13 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 5 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 23
-
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper • 2403.04132 • Published • 38 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 18 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 103 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 38 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 51 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 44
-
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2 -
SocialIQA: Commonsense Reasoning about Social Interactions
Paper • 1904.09728 • Published • 2 -
HellaSwag: Can a Machine Really Finish Your Sentence?
Paper • 1905.07830 • Published • 4