Sarashina2.2-3B
This repository provides large language models trained by SB Intuitions.
How to use
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed
model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-3b", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.2-3b")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
set_seed(123)
text = generator(
"おはようございます、今日の天気は",
max_length=30,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
num_return_sequences=3,
)
for t in text:
print(t)
Model Description
We constructed the Sarashina2.2-3B model, which consists of about 3 billion parameters (excluding embeddings and the LM head when calculating the number of parameters), using a three-phase training process. First, we trained the model on 10 trillion tokens, including Japanese, English, and code data extracted from web corpora. Next, we trained the model using synthetic data to improve its performance on math and coding tasks. Finally, we trained the model with a small amount of data to enhance its performance on various application tasks.
The following tables show the model's performance on Japanese tasks. For reference, we also present the performance of our previous LLMs. As shown in the table, our Sarashina2.2-3B outperforms Sarashina2-7B in Japanese QA tasks such as NIILC and JMMLU. In addition, Sarashina2.2-3B outperforms Sarashina2-70B in Japanese math and coding tasks such as MGSM-ja and JHumanEval.
Evaluation in Japanese tasks
Model | NIILC | JMMLU | MGSM-ja | JHumanEval |
---|---|---|---|---|
Sarashina2-7B | 61.4 | 42.5 | 8.4 | 12.8 |
Sarashina2-70B | 65.4 | 62.7 | 54.0 | 22.0 |
Sarashina2.2-0.5B | 33.9 | 28.8 | 21.6 | 15.2 |
Sarashina2.2-1B | 47.2 | 38.2 | 39.6 | 20.7 |
Sarashina2.2-3B | 63.0 | 52.7 | 63.6 | 39.0 |
Ethical Considerations and Limitations
This repository contains the pre-trained model, which has not yet been tuned to follow instructions. Therefore, this model may generate meaningless sequences, inaccurate instances, or biased/objectionable outputs. As post-trained Sarashina2.2 models, we have published Sarashina2.2-0.5B-instruct-v0.1, Sarashina2.2-1B-instruct-v0.1, and Sarashina2.2-3B-instruct-v0.1.
License
- Downloads last month
- 945