|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- Rakuten/RakutenAI-7B |
|
--- |
|
--- |
|
license: apache-2.0 |
|
--- |
|
# RakutenAI-2.0-8x7B |
|
## Model Description |
|
RakutenAI-2.0-8x7B is an MoE-based foundation model derived from [RakutenAI-7B](https://huggingface.co/Rakuten/RakutenAI-7B), first introduced in March 2024. As part of a broader initiative to advance Japanese LLM technology, RakutenAI-2.0-8x7B adopts a Mixture of Experts (MoE) architecture with two active experts, resulting in **13B active parameters**. This design enables dynamic expert selection based on input tokens, enhancing computational efficiency while maintaining high performance. RakutenAI-2.0-8x7B achieves state-of-the-art results on Japanese language understanding benchmarks while also demonstrating competitive performance on English evaluation tasks compared to similar models, including Swallow-MX-8x7B-NVE-0.1, Llama-3-Swallow-70B-v0.1, Sarashina2-70B, and PLaMo 100B. |
|
|
|
*If you are looking for an instruction-tuned model, check [RakutenAI-2.0-8x7B-instruct](https://huggingface.co/Rakuten/RakutenAI-2.0-8x7B-instruct)*. |
|
|
|
## Model Evaluation Results |
|
|
|
| Foundation Model Name | Japanese Score | English Score | Average | |
|
|-----------------------------------------------|---------------|--------------|---------| |
|
| Rakuten/RakutenAI-7B | 62.93 | 34.86 | 48.90 | |
|
| **Rakuten/RakutenAI-2.0-8x7B** | **72.29** | 41.32 | 56.80 | |
|
| Tokyotech/Swallow-MX-8x7B-NVE-0.1 | 66.17 | 44.33 | 55.25 | |
|
| Tokyotech/Llama-3-Swallow-70B-v0.1 | 68.15 | **51.52** | **59.84** | |
|
| SBIntuitions/Sarashina2-70B | 71.09 | 39.22 | 55.16 | |
|
| PreferredNetworks/PLaMo 100B | 71.45 | 36.48 | 53.96 | |
|
|
|
<div style="text-align: center;">Table1: RakutenAI-2.0-8x7B foundation model average performance scores on LM-Harness in comparison with other Japanese open models.</div> |
|
|
|
Detailed scores are as follows: |
|
|
|
| Metric | jcommonsense_qa | jnli | marc_ja | jsquad | jaqket_v2 | xlsum_ja | xwinograd | mgsm | arc_challenge | hellaswag | mmlu | truthfulqa_mc2 | gsm8k | winogrande | musr | math_hard | gpqa | bbh | ifeval | mmlu_pro | |
|
|----------------------|-----------------|-------|---------|--------|-----------|----------|-----------|-------|---------------|-----------|-------|----------------|-------|------------|-------|-----------|-------|-------|--------|----------| |
|
| **Model Name** | accuracy-3shot | accuracy-3shot | accuracy-3shot | exact_match-2shot | exact_match-1shot | rouge2-1shot | accuracy-0shot | accuracy-5shot | accuracy_norm-25shot | accuracy_norm-10shot | accuracy-5shot | accuracy-0shot | exact_match-5shot | accuracy-5shot | accuracy_norm-0shot | exact_match-4shot | accuracy_norm-0shot | accuracy_norm-3shot | avg_inst_prompt_strict_acc-0shot | accuracy-5shot | |
|
| RakutenAI-7B | 85.88 | 56.61 | 96.52 | 69.56 | 81.44 | 15.69 | 74.14 | 23.60 | 60.75 | 82.26 | 59.83 | 38.33 | 32.6 | 77.43 | 4.93 | 2.16 | 5.02 | 20.34 | 14.04 | 20.57 | |
|
| RakutenAI-2.0-8x7B | 93.12 | 87.43 | 97.72 | 74.49 | 86.00 | 15.70 | 78.62 | 45.20 | 66.38 | 85.84 | 65.50 | 48.19 | 51.40 | 80.51 | 13.88 | 3.30 | 5.71 | 27.02 | 22.90 | 25.22 | |
|
| Swallow-MX-8x7B-NVE-0.1 | 89.28 | 43.06 | 97.15 | 76.29 | 87.37 | 17.09 | 82.69 | 40.40 | 65.87 | 85.13 | 69.48 | 50.38 | 58.45 | 82.87 | 8.78 | 7.50 | 13.33 | 29.41 | 28.38 | 32.32 | |
|
| Llama-3-Swallow-70B-v0.1 | 92.58 | 66.15 | 93.46 | 70.94 | 71.74 | 12.58 | 83.32 | 54.40 | 67.58 | 87.53 | 77.47 | 55.29 | 81.50 | 85.16 | 22.05 | 13.92 | 16.60 | 49.53 | 20.91 | 40.70 | |
|
| Sarashina2-70B | 95.35 | 60.44 | 94.50 | 76.90 | 88.49 | 18.24 | 80.81 | 54.00 | 62.63 | 83.23 | 63.10 | 48.68 | 24.49 | 79.95 | 13.52 | 5.29 | 5.54 | 29.73 | 30.32 | 24.13 | |
|
| PLaMo 100B | 92.05 | 68.82 | 97.49 | 78.01 | 89.43 | 20.38 | 81.02 | 44.40 | 49.91 | 80.98 | 55.17 | 44.91 | 56.10 | 71.35 | 6.67 | 0.00 | 4.00 | 23.99 | 23.39 | 21.31 | |
|
|
|
<div style="text-align: center;">Table2: RakutenAI-2.0-8x7B foundation model performance on LM-Harness metrics in comparison with other Japanese open models.</div> |
|
|
|
## Usage |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
model_path = "Rakuten/RakutenAI-2.0-8x7B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto", device_map="auto") |
|
model.eval() |
|
|
|
requests = [ |
|
"南硫黄島原生自然環境保全地域は、自然", |
|
"The capybara is a giant cavy rodent", |
|
] |
|
|
|
for req in requests: |
|
input_text = tokenizer(req, return_tensors="pt").to(device=model.device) |
|
tokens = model.generate( |
|
**input_text, |
|
max_new_tokens=512, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id, |
|
) |
|
out = tokenizer.decode(tokens[0], skip_special_tokens=True) |
|
print("INPUT:\n" + req) |
|
print("OUTPUT:\n" + out) |
|
|
|
``` |
|
**Note on Evaluation Scores:** |
|
- Evaluation tests were carried out on LM Evaluation Harness during October - December 2024. We use default task definitions from the following commit: https://github.com/EleutherAI/lm-evaluation-harness/commit/26f607f5432e1d09c55b25488c43523e7ecde657 |
|
- The tasks considered for Japanese evaluations are listed here: https://github.com/EleutherAI/lm-evaluation-harness/blob/26f607f5432e1d09c55b25488c43523e7ecde657/lm_eval/tasks/japanese_leaderboard/README.md |
|
- The tasks considered for English evaluations are listed here: https://huggingface.co/docs/leaderboards/en/open_llm_leaderboard/archive |
|
https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/README.md |
|
|
|
## Model Details |
|
|
|
* **Developed by**: [Rakuten Group, Inc.](https://ai.rakuten.com/) |
|
* **Language(s)**: Japanese, English |
|
* **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). |
|
* **Model Architecture**: Mixture of Experts (2 active experts) |
|
|
|
### Limitations and Bias |
|
|
|
The suite of RakutenAI-2.0 models is capable of generating human-like text on a wide range of topics. However, like all LLMs, they have limitations and can produce biased, inaccurate, or unsafe outputs. Please exercise caution and judgement while interacting with them. |
|
|
|
## Citation |
|
For citing our work on the suite of RakutenAI-2.0 models, please use: |
|
|
|
``` |
|
@misc{rakutengroup2025rakutenai2.0, |
|
author = {Rakuten Group, Inc.}, |
|
title = {RakutenAI-2.0}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
url = {https://huggingface.co/Rakuten}, |
|
} |
|
|
|
``` |