Sarashina2.2-3B

This repository provides large language models trained by SB Intuitions.

How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed
 
model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-3b", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.2-3b")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
set_seed(123)
 
text = generator(
    "おはようございます、今日の天気は",
    max_length=30,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    num_return_sequences=3,
)

for t in text:
    print(t)

Model Description

We constructed the Sarashina2.2-3B model, which consists of about 3 billion parameters (excluding embeddings and the LM head when calculating the number of parameters), using a three-phase training process. First, we trained the model on 10 trillion tokens, including Japanese, English, and code data extracted from web corpora. Next, we trained the model using synthetic data to improve its performance on math and coding tasks. Finally, we trained the model with a small amount of data to enhance its performance on various application tasks.

The following tables show the model's performance on Japanese tasks. For reference, we also present the performance of our previous LLMs. As shown in the table, our Sarashina2.2-3B outperforms Sarashina2-7B in Japanese QA tasks such as NIILC and JMMLU. In addition, Sarashina2.2-3B outperforms Sarashina2-70B in Japanese math and coding tasks such as MGSM-ja and JHumanEval.

Evaluation in Japanese tasks

Model	NIILC	JMMLU	MGSM-ja	JHumanEval
Sarashina2-7B	61.4	42.5	8.4	12.8
Sarashina2-70B	65.4	62.7	54.0	22.0
Sarashina2.2-0.5B	33.9	28.8	21.6	15.2
Sarashina2.2-1B	47.2	38.2	39.6	20.7
Sarashina2.2-3B	63.0	52.7	63.6	39.0

Ethical Considerations and Limitations

This repository contains the pre-trained model, which has not yet been tuned to follow instructions. Therefore, this model may generate meaningless sequences, inaccurate instances, or biased/objectionable outputs. As post-trained Sarashina2.2 models, we have published Sarashina2.2-0.5B-instruct-v0.1, Sarashina2.2-1B-instruct-v0.1, and Sarashina2.2-3B-instruct-v0.1.

License

MIT License

sbintuitions
/

sarashina2.2-3b

Sarashina2.2-3B

How to use

Model Description

Evaluation in Japanese tasks

Ethical Considerations and Limitations

License

Model tree for sbintuitions/sarashina2.2-3b

Collection including sbintuitions/sarashina2.2-3b

Sarashina2.2