Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.01535

Running

289

🌍

PDF Chatbot
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 114
MongoDB/tech-news-embeddings

Viewer • Updated Feb 26 • 1.58M • 28 • 4
tarteel-ai/everyayah

Viewer • Updated Dec 9, 2022 • 127k • 22 • 11

Natural Language (LLM, NLP etc)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18 • 53
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17 • 32
How Far Can We Go with Practical Function-Level Program Repair?

Paper • 2404.12833 • Published Apr 19 • 6
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29 • 68

Papers - Reward Model - Fine-tuning

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 114

Tailor-made LLM evaluations: custom evaluations for your LLM

Collection of articles and resources focusing on automatic evaluation for LLM's and their role as unbiased judges in assessing other LLMs' outputs

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 28
Generative Judge for Evaluating Alignment

Paper • 2310.05470 • Published Oct 9, 2023 • 1
Humans or LLMs as the Judge? A Study on Judgement Biases

Paper • 2402.10669 • Published Feb 16
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 32

Papers - KAIST AI

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 60
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models

Paper • 2404.07738 • Published Apr 11 • 2
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 114

Latxa: An Open Language Model and Evaluation Suite for Basque

Paper • 2403.20266 • Published Mar 29 • 3
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10 • 63
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 114
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

Papers - University - University of Illinois

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2 • 43
PointInfinity: Resolution-Invariant Point Diffusion Models

Paper • 2404.03566 • Published Apr 4 • 13
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Paper • 2404.08252 • Published Apr 12 • 5
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22 • 23

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7 • 38
Evaluating Very Long-Term Conversational Memory of LLM Agents

Paper • 2402.17753 • Published Feb 27 • 18
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20 • 16
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20 • 10

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 103
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 38
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 51
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 44

Papers - Institute - Allen Institute

The Curious Case of Neural Text Degeneration

Paper • 1904.09751 • Published Apr 22, 2019 • 3
PIQA: Reasoning about Physical Commonsense in Natural Language

Paper • 1911.11641 • Published Nov 26, 2019 • 2
SocialIQA: Commonsense Reasoning about Social Interactions

Paper • 1904.09728 • Published Apr 22, 2019 • 2
HellaSwag: Can a Machine Really Finish Your Sentence?

Paper • 1905.07830 • Published May 19, 2019 • 4

Previous
1
2
3
4
5
6
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs