Update README.md
Browse files
README.md
CHANGED
@@ -136,6 +136,9 @@ The models have been fine-tuned on the following datasets.
|
|
136 |
|
137 |
You can view the evaluation results of several LLMs on this [leaderboard](http://wandb.me/llm-jp-leaderboard). We used [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval) (v1.3.0) for the evaluation.
|
138 |
|
|
|
|
|
|
|
139 |
## Risks and Limitations
|
140 |
|
141 |
The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
|
|
|
136 |
|
137 |
You can view the evaluation results of several LLMs on this [leaderboard](http://wandb.me/llm-jp-leaderboard). We used [llm-jp-eval](https://github.com/llm-jp/llm-jp-eval) (v1.3.0) for the evaluation.
|
138 |
|
139 |
+
Besides, we used LLM-as-a-judge frameworks, [Japanese Vicuna QA Benchmark](https://github.com/ku-nlp/ja-vicuna-qa-benchmark/) and [Japanese MT Bench](https://github.com/Stability-AI/FastChat/tree/jp-stable/fastchat/llm_judge), for evaluation.
|
140 |
+
For details, please refer to [our technical blog](https://llm-jp.nii.ac.jp/blog/2024/04/30/v2.0-release.html) (in Japanese).
|
141 |
+
|
142 |
## Risks and Limitations
|
143 |
|
144 |
The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
|