Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
AkimfromParis
commited on
Update font display source license v1.2
Browse files- src/about.py +35 -92
src/about.py
CHANGED
@@ -90,9 +90,7 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
90 |
## How it works
|
91 |
📈 We evaluate Japanese Large Language Models on 52 key benchmarks leveraging our evaluation tool [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval), a unified framework to evaluate Japanese LLMs on various evaluation tasks.
|
92 |
|
93 |
-
|
94 |
-
|
95 |
-
- **NLI (Natural Language Inference)**
|
96 |
|
97 |
* `Jamp`, a Japanese NLI benchmark focused on temporal inference [Source](https://github.com/tomo-ut/temporalNLI_dataset) (License CC BY-SA 4.0)
|
98 |
|
@@ -104,126 +102,71 @@ Benchmarks:
|
|
104 |
|
105 |
* `JSICK`, Japanese Sentences Involving Compositional Knowledge [Source](https://github.com/verypluming/JSICK) (License CC BY-SA 4.0)
|
106 |
|
107 |
-
|
108 |
-
|
109 |
-
###JEMHopQA
|
110 |
-
|
111 |
-
[Source](https://github.com/aiishii/JEMHopQA)
|
112 |
-
(License CC BY-SA 4.0)
|
113 |
-
|
114 |
-
###NIILC
|
115 |
-
|
116 |
-
[Source](https://github.com/mynlp/niilc-qa)
|
117 |
-
(License CC BY-SA 4.0)
|
118 |
-
|
119 |
-
###JAQKET (AIO)
|
120 |
-
|
121 |
-
[Source](https://www.nlp.ecei.tohoku.ac.jp/projects/jaqket/)
|
122 |
-
(License CC BY-SA 4.0)(Other licenses are required for corporate usage)
|
123 |
-
|
124 |
-
- **RC (Reading Comprehension)**
|
125 |
-
|
126 |
-
###JSQuAD
|
127 |
-
|
128 |
-
[Source](https://github.com/yahoojapan/JGLUE)
|
129 |
-
(License CC BY-SA 4.0)
|
130 |
|
131 |
-
-
|
132 |
|
133 |
-
|
134 |
|
135 |
-
[Source](https://
|
136 |
-
License:MIT License
|
137 |
|
138 |
-
|
139 |
|
140 |
-
[Source](https://github.com/yahoojapan/JGLUE)
|
141 |
-
(License CC BY-SA 4.0)
|
142 |
|
143 |
-
|
144 |
|
145 |
-
[Source](https://github.com/
|
146 |
-
(License CC BY-SA 4.0)
|
147 |
|
148 |
-
|
149 |
|
150 |
-
|
151 |
|
152 |
-
|
153 |
-
(License CC BY-SA 4.0)
|
154 |
|
155 |
-
-
|
156 |
|
157 |
-
|
158 |
|
159 |
-
[Source](https://github.com/ku-nlp/WikipediaAnnotatedCorpus)
|
160 |
-
(License CC BY-SA 4.0)
|
161 |
List of tasks:
|
|
|
|
|
|
|
|
|
|
|
162 |
|
163 |
-
|
164 |
-
Named-entity recognition (NER)
|
165 |
-
Dependency Parsing
|
166 |
-
Predicate-argument structure analysis (PAS)
|
167 |
-
Coreference Resolution
|
168 |
|
169 |
-
|
170 |
|
171 |
-
|
172 |
|
173 |
-
|
174 |
-
License:Apache-2.0
|
175 |
|
176 |
-
|
177 |
|
178 |
-
[Source](https://
|
179 |
-
License:MIT License
|
180 |
|
181 |
-
|
182 |
-
|
183 |
-
###Asian Language Treebank (ALT) - Parallel Corpus
|
184 |
-
|
185 |
-
[Source](https://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/index.html)
|
186 |
-
(License CC BY-SA 4.0)
|
187 |
-
|
188 |
-
###WikiCorpus (Japanese-English Bilingual Corpus of Wikipedia's articles about the city of Kyoto)
|
189 |
-
|
190 |
-
[Source](https://alaginrc.nict.go.jp/WikiCorpus/)
|
191 |
-
License:CC BY-SA 3.0 deed
|
192 |
-
|
193 |
-
- **STS (Semantic Textual Similarity)**
|
194 |
|
195 |
This task is supported by llm-jp-eval, but it is not included in the evaluation score average.
|
196 |
|
197 |
-
|
198 |
-
|
199 |
-
[Source](https://github.com/yahoojapan/JGLUE)
|
200 |
-
(License CC BY-SA 4.0)
|
201 |
-
|
202 |
-
- **HE (Human Examination)**
|
203 |
-
|
204 |
-
###MMLU
|
205 |
-
|
206 |
-
[Source](https://github.com/hendrycks/test)
|
207 |
-
License:MIT License
|
208 |
-
|
209 |
-
###JMMLU
|
210 |
|
211 |
-
|
212 |
-
License:CC BY-SA 4.0(3 tasks under the CC BY-NC-ND 4.0 license)
|
213 |
|
214 |
-
|
215 |
|
216 |
-
|
217 |
|
218 |
-
|
219 |
-
(License CC BY-SA 4.0)
|
220 |
|
221 |
-
|
222 |
|
223 |
-
|
224 |
|
225 |
-
[Source](https://github.com/csebuetnlp/xl-sum)
|
226 |
-
License:CC BY-NC-SA 4.0(Due to the non-commercial license, this dataset will not be used, unless you specifically agree to the license and terms of use)
|
227 |
|
228 |
|
229 |
## Reproducibility
|
|
|
90 |
## How it works
|
91 |
📈 We evaluate Japanese Large Language Models on 52 key benchmarks leveraging our evaluation tool [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval), a unified framework to evaluate Japanese LLMs on various evaluation tasks.
|
92 |
|
93 |
+
**NLI (Natural Language Inference)**
|
|
|
|
|
94 |
|
95 |
* `Jamp`, a Japanese NLI benchmark focused on temporal inference [Source](https://github.com/tomo-ut/temporalNLI_dataset) (License CC BY-SA 4.0)
|
96 |
|
|
|
102 |
|
103 |
* `JSICK`, Japanese Sentences Involving Compositional Knowledge [Source](https://github.com/verypluming/JSICK) (License CC BY-SA 4.0)
|
104 |
|
105 |
+
**NQA (Question Answering)**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
+
* `JEMHopQA`, Japanese Explainable Multi-hop Question Answering [Source](https://github.com/aiishii/JEMHopQA) (License CC BY-SA 4.0)
|
108 |
|
109 |
+
* `NIILC`, NIILC Question Answering Dataset [Source](https://github.com/mynlp/niilc-qa) (License CC BY-SA 4.0)
|
110 |
|
111 |
+
* `JAQKET`, Japanese QA dataset on the subject of quizzes [Source](https://www.nlp.ecei.tohoku.ac.jp/projects/jaqket/) (License CC BY-SA 4.0 - Other licenses are required for corporate usage)
|
|
|
112 |
|
113 |
+
**RC (Reading Comprehension)**
|
114 |
|
115 |
+
* `JSQuAD`, Japanese version of SQuAD (part of JGLUE) [Source](https://github.com/yahoojapan/JGLUE) (License CC BY-SA 4.0)
|
|
|
116 |
|
117 |
+
**MC (Multiple Choice question answering)**
|
118 |
|
119 |
+
* `JCommonsenseMorality`, Japanese dataset for evaluating commonsense morality understanding [Source](https://github.com/Language-Media-Lab/commonsense-moral-ja) (License MIT License)
|
|
|
120 |
|
121 |
+
* `JCommonsenseQA`, Japanese version of CommonsenseQA [Source](https://github.com/yahoojapan/JGLUE) (License CC BY-SA 4.0)
|
122 |
|
123 |
+
* `KUCI`, Kyoto University Commonsense Inference dataset [Source](https://github.com/ku-nlp/KUCI (License CC BY-SA 4.0)
|
124 |
|
125 |
+
**EL (Entity Linking)**
|
|
|
126 |
|
127 |
+
* `chABSA`, Aspect-Based Sentiment Analysis dataset [Source](https://github.com/chakki-works/chABSA-dataset) (License CC BY-SA 4.0)
|
128 |
|
129 |
+
**FA (Fundamental Analysis)**
|
130 |
|
131 |
+
* `Wikipedia Annotated Corpus`, [Source](https://github.com/ku-nlp/WikipediaAnnotatedCorpus) (License CC BY-SA 4.0)
|
|
|
132 |
List of tasks:
|
133 |
+
- Reading Prediction
|
134 |
+
- Named-entity recognition (NER)
|
135 |
+
- Dependency Parsing
|
136 |
+
- Predicate-argument structure analysis (PAS)
|
137 |
+
- Coreference Resolution
|
138 |
|
139 |
+
**MR (Mathematical Reasoning)**
|
|
|
|
|
|
|
|
|
140 |
|
141 |
+
* `MAWPS`, Japanese version of MAWPS (A Math Word Problem Repository) [Source](https://github.com/nlp-waseda/chain-of-thought-ja-dataset) (License Apache-2.0)
|
142 |
|
143 |
+
* `MGSM`, Japanese part of MGSM (Multilingual Grade School Math Benchmark) [Source](https://huggingface.co/datasets/juletxara/mgsm) (License MIT License)
|
144 |
|
145 |
+
**MT (Machine Translation)**
|
|
|
146 |
|
147 |
+
* `ALT`, Asian Language Treebank (ALT) - Parallel Corpus [Source](https://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/index.html) (License CC BY-SA 4.0)
|
148 |
|
149 |
+
* `WikiCorpus`, Japanese-English Bilingual Corpus of Wikipedia's articles about the city of Kyoto [Source](https://alaginrc.nict.go.jp/WikiCorpus/) (License CC BY-SA 3.0)
|
|
|
150 |
|
151 |
+
**STS (Semantic Textual Similarity)**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
152 |
|
153 |
This task is supported by llm-jp-eval, but it is not included in the evaluation score average.
|
154 |
|
155 |
+
* `JSTS`, Japanese version of the STS (Semantic Textual Similarity) (part of JGLUE) [Source](https://github.com/yahoojapan/JGLUE) (License CC BY-SA 4.0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
156 |
|
157 |
+
**HE (Human Examination)**
|
|
|
158 |
|
159 |
+
* `MMLU`, Measuring Massive Multitask Language Understanding [Source](https://github.com/hendrycks/test) (License MIT License)
|
160 |
|
161 |
+
* `JMMLU`, Japanese Massive Multitask Language Understanding Benchmark [Source](https://github.com/nlp-waseda/JMMLU) (License CC BY-SA 4.0(3 tasks under the CC BY-NC-ND 4.0 license)
|
162 |
|
163 |
+
**CG (Code Generation)**
|
|
|
164 |
|
165 |
+
* `MBPP`, Japanese version of Mostly Basic Python Problems (MBPP) [Source](https://huggingface.co/datasets/llm-jp/mbpp-ja) (License CC BY-SA 4.0)
|
166 |
|
167 |
+
**SUM (Summarization)**
|
168 |
|
169 |
+
* `XL-Sum`, XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages [Source](https://github.com/csebuetnlp/xl-sum) (License CC BY-NC-SA 4.0, due to the non-commercial license, this dataset will not be used, unless you specifically agree to the license and terms of use)
|
|
|
170 |
|
171 |
|
172 |
## Reproducibility
|