nicholasKluge
commited on
Commit
·
aaab92f
1
Parent(s):
be4f228
Update README.md
Browse files
README.md
CHANGED
@@ -14,19 +14,11 @@ tags:
|
|
14 |
- assistant
|
15 |
pipeline_tag: text-generation
|
16 |
widget:
|
17 |
-
- text: <|startofinstruction|>
|
18 |
-
example_title: Greetings
|
19 |
-
- text: >-
|
20 |
-
<|startofinstruction|>Can you explain what is Machine
|
21 |
-
Learning?<|endofinstruction|>
|
22 |
example_title: Machine Learning
|
23 |
-
- text:
|
24 |
-
<|startofinstruction|>Do you know anything about virtue
|
25 |
-
ethics?<|endofinstruction|>
|
26 |
example_title: Ethics
|
27 |
-
- text:
|
28 |
-
<|startofinstruction|>How can I make my girlfriend
|
29 |
-
happy?<|endofinstruction|>
|
30 |
example_title: Advise
|
31 |
inference:
|
32 |
parameters:
|
@@ -114,16 +106,17 @@ The model will output something like:
|
|
114 |
|
115 |
## Evaluation
|
116 |
|
117 |
-
|
|
118 |
-
|
119 |
-
|
|
120 |
-
|
|
121 |
-
|
|
122 |
-
|
|
123 |
-
|
|
124 |
-
|
|
125 |
-
|
|
126 |
-
|
|
|
|
127 |
|
128 |
* Evaluations were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)).
|
129 |
|
|
|
14 |
- assistant
|
15 |
pipeline_tag: text-generation
|
16 |
widget:
|
17 |
+
- text: "<|startofinstruction|>Can you explain what is Machine Learning?<|endofinstruction|>"
|
|
|
|
|
|
|
|
|
18 |
example_title: Machine Learning
|
19 |
+
- text: "<|startofinstruction|>Do you know anything about virtue ethics?<|endofinstruction|>"
|
|
|
|
|
20 |
example_title: Ethics
|
21 |
+
- text: "<|startofinstruction|>How can I make my girlfriend happy?<|endofinstruction|>"
|
|
|
|
|
22 |
example_title: Advise
|
23 |
inference:
|
24 |
parameters:
|
|
|
106 |
|
107 |
## Evaluation
|
108 |
|
109 |
+
|Model (GPT-2) |Average |[ARC](https://arxiv.org/abs/1803.05457) |[TruthfulQA](https://arxiv.org/abs/2109.07958) |[ToxiGen](https://arxiv.org/abs/2203.09509) |
|
110 |
+
| ---------------------------------------------------------------------- | -------- | -------------------------------------- | --------------------------------------------- | ------------------------------------------ |
|
111 |
+
|[Aira-2-124M-DPO](https://huggingface.co/nicholasKluge/Aira-2-124M-DPO) |**40.68** |**24.66** |**42.61** |**54.79** |
|
112 |
+
|[Aira-2-124M](https://huggingface.co/nicholasKluge/Aira-2-124M) |38.07 |24.57 |41.02 |48.62 |
|
113 |
+
|GPT-2 |35.37 |21.84 |40.67 |43.62 |
|
114 |
+
|[Aira-2-355M](https://huggingface.co/nicholasKluge/Aira-2-355M) |**39.68** |**27.56** |38.53 |**53.19** |
|
115 |
+
|GPT-2-medium |36.43 |27.05 |**40.76** |41.49 |
|
116 |
+
|[Aira-2-774M](https://huggingface.co/nicholasKluge/Aira-2-774M) |**42.26** |**28.75** |**41.33** |**56.70** |
|
117 |
+
|GPT-2-large |35.16 |25.94 |38.71 |40.85 |
|
118 |
+
|[Aira-2-1B5](https://huggingface.co/nicholasKluge/Aira-2-1B5) |**42.22** |28.92 |**41.16** |**56.60** |
|
119 |
+
|GPT-2-xl |36.84 |**30.29** |38.54 |41.70 |
|
120 |
|
121 |
* Evaluations were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)).
|
122 |
|