Spaces:
Running
Running
Commit
β’
9853939
1
Parent(s):
de201d8
Fixed typos and minor edits (#1)
Browse files- Fixed typos and minor edits (d7040420ed91d2308e8911de9910e4d8b54a67ae)
Co-authored-by: Amit Dhurandhar <sadhamanus@users.noreply.huggingface.co>
- assets/header.md +5 -3
assets/header.md
CHANGED
@@ -1,6 +1,8 @@
|
|
1 |
-
<h1 style='text-align: center; color: black;'>π₯ Ranking LLMs without
|
2 |
|
3 |
|
4 |
-
This space demonstrates
|
5 |
|
6 |
-
|
|
|
|
|
|
1 |
+
<h1 style='text-align: center; color: black;'>π₯ Ranking LLMs without Ground Truth </h1>
|
2 |
|
3 |
|
4 |
+
This space demonstrates ranking of large language models with access to just input prompts (viz. only questions in Q&A tasks) as described in our 2024 ACL Findings paper [Ranking Large Language Models without Ground Truth](https://arxiv.org/abs/2402.14860). <br>
|
5 |
|
6 |
+
[Source code](https://huggingface.co/spaces/ibm/llm-rank-themselves/tree/main) is included as part of this space. Installation and usage instructions are provided below.
|
7 |
+
|
8 |
+
Inspired by real life where both an expert and a knowledgeable person can identify a novice the main idea is to consider triplets of models, where each one of them evaluates the other two, correctly identifying the worst model in the triplet with high probability. Iteratively performing such evaluations yields a estimated ranking that doesn't require ground truth/reference data which can be expensive to gather. The methods are a viable low-resource ranking mechanism for practical use. <br>
|