Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -25,11 +25,114 @@ datasets:
25
  - Intel/orca_dpo_pairs
26
  - unalignment/toxic-dpo-v0.1
27
  - jondurbin/truthy-dpo-v0.1
28
- - allenai/ultrafeedback_binarized_cleaned
29
  - Squish42/bluemoon-fandom-1-1-rp-cleaned
30
  - LDJnr/Capybara
31
  - JULIELab/EmoBank
32
  - kingbri/PIPPA-shareGPT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ---
34
 
35
  # A bagel, with everything
@@ -513,4 +616,17 @@ I am not a lawyer, so I can't help determine if this is actually commercially vi
513
  - If the dataset was released under a permissive license, but actually includes OpenAI generated data, does that ToS supersede the license?
514
  - Does the dataset fall completely under fair use anyways, since the model isn't really capable of reproducing the entire training set verbatim?
515
 
516
- Use your best judgement and seek legal advice if you are concerned about the terms. In any case, by using this model, you agree to completely indemnify me.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  - Intel/orca_dpo_pairs
26
  - unalignment/toxic-dpo-v0.1
27
  - jondurbin/truthy-dpo-v0.1
28
+ - allenai/ultrafeedback_binarized_cleaned
29
  - Squish42/bluemoon-fandom-1-1-rp-cleaned
30
  - LDJnr/Capybara
31
  - JULIELab/EmoBank
32
  - kingbri/PIPPA-shareGPT
33
+ model-index:
34
+ - name: bagel-dpo-8x7b-v0.2
35
+ results:
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: AI2 Reasoning Challenge (25-Shot)
41
+ type: ai2_arc
42
+ config: ARC-Challenge
43
+ split: test
44
+ args:
45
+ num_few_shot: 25
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 72.1
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-8x7b-v0.2
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: HellaSwag (10-Shot)
58
+ type: hellaswag
59
+ split: validation
60
+ args:
61
+ num_few_shot: 10
62
+ metrics:
63
+ - type: acc_norm
64
+ value: 86.41
65
+ name: normalized accuracy
66
+ source:
67
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-8x7b-v0.2
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: MMLU (5-Shot)
74
+ type: cais/mmlu
75
+ config: all
76
+ split: test
77
+ args:
78
+ num_few_shot: 5
79
+ metrics:
80
+ - type: acc
81
+ value: 70.27
82
+ name: accuracy
83
+ source:
84
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-8x7b-v0.2
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: TruthfulQA (0-shot)
91
+ type: truthful_qa
92
+ config: multiple_choice
93
+ split: validation
94
+ args:
95
+ num_few_shot: 0
96
+ metrics:
97
+ - type: mc2
98
+ value: 72.83
99
+ source:
100
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-8x7b-v0.2
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: Winogrande (5-shot)
107
+ type: winogrande
108
+ config: winogrande_xl
109
+ split: validation
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 83.27
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-8x7b-v0.2
118
+ name: Open LLM Leaderboard
119
+ - task:
120
+ type: text-generation
121
+ name: Text Generation
122
+ dataset:
123
+ name: GSM8k (5-shot)
124
+ type: gsm8k
125
+ config: main
126
+ split: test
127
+ args:
128
+ num_few_shot: 5
129
+ metrics:
130
+ - type: acc
131
+ value: 50.04
132
+ name: accuracy
133
+ source:
134
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jondurbin/bagel-dpo-8x7b-v0.2
135
+ name: Open LLM Leaderboard
136
  ---
137
 
138
  # A bagel, with everything
 
616
  - If the dataset was released under a permissive license, but actually includes OpenAI generated data, does that ToS supersede the license?
617
  - Does the dataset fall completely under fair use anyways, since the model isn't really capable of reproducing the entire training set verbatim?
618
 
619
+ Use your best judgement and seek legal advice if you are concerned about the terms. In any case, by using this model, you agree to completely indemnify me.
620
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
621
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jondurbin__bagel-dpo-8x7b-v0.2)
622
+
623
+ | Metric |Value|
624
+ |---------------------------------|----:|
625
+ |Avg. |72.49|
626
+ |AI2 Reasoning Challenge (25-Shot)|72.10|
627
+ |HellaSwag (10-Shot) |86.41|
628
+ |MMLU (5-Shot) |70.27|
629
+ |TruthfulQA (0-shot) |72.83|
630
+ |Winogrande (5-shot) |83.27|
631
+ |GSM8k (5-shot) |50.04|
632
+