vincelwt commited on
Commit
12e3264
1 Parent(s): 8cd07d9
Files changed (1) hide show
  1. app/page.js +5 -5
app/page.js CHANGED
@@ -9,22 +9,22 @@ export default async function Leaderboard() {
9
  <>
10
  <p>
11
  Traditional LLMs benchmarks have drawbacks: they quickly become part of
12
- training datasets and are hard to relate-to in terms of real-world
13
  use-cases.
14
  </p>
15
  <p>
16
- I made this as an experiment to address these issues. Here the dataset
17
  is dynamic (changes every week) and composed of crowdsourced real-world
18
  prompts.
19
  </p>
20
  <p>
21
  We then use GPT-4 to grade each model's response against a set of
22
  rubrics (more details on the about page). The prompt dataset is easily
23
- explorable as the score is only 1 dimension.
24
  </p>
25
  <p>
26
- The results are stored in Postgres database and those are the raw
27
- results.
28
  </p>
29
 
30
  <br />
 
9
  <>
10
  <p>
11
  Traditional LLMs benchmarks have drawbacks: they quickly become part of
12
+ training datasets and are hard to relate to in terms of real-world
13
  use-cases.
14
  </p>
15
  <p>
16
+ I made this as an experiment to address these issues. Here, the dataset
17
  is dynamic (changes every week) and composed of crowdsourced real-world
18
  prompts.
19
  </p>
20
  <p>
21
  We then use GPT-4 to grade each model's response against a set of
22
  rubrics (more details on the about page). The prompt dataset is easily
23
+ explorable.
24
  </p>
25
  <p>
26
+ Everything is then stored in a Postgres database and this page shows the
27
+ raw results.
28
  </p>
29
 
30
  <br />