Sarashina2.2-3B

This repository provides large language models trained by SB Intuitions.

How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed
 
model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-3b", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.2-3b")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
set_seed(123)
 
text = generator(
    "おはようございます、今日の天気は",
    max_length=30,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    num_return_sequences=3,
)

for t in text:
    print(t)

Model Description

We constructed the Sarashina2.2-3B model, which consists of about 3 billion parameters (excluding embeddings and the LM head when calculating the number of parameters), using a three-phase training process. First, we trained the model on 10 trillion tokens, including Japanese, English, and code data extracted from web corpora. Next, we trained the model using synthetic data to improve its performance on math and coding tasks. Finally, we trained the model with a small amount of data to enhance its performance on various application tasks.

The following tables show the model's performance on Japanese tasks. For reference, we also present the performance of our previous LLMs. As shown in the table, our Sarashina2.2-3B outperforms Sarashina2-7B in Japanese QA tasks such as NIILC and JMMLU. In addition, Sarashina2.2-3B outperforms Sarashina2-70B in Japanese math and coding tasks such as MGSM-ja and JHumanEval.

Evaluation in Japanese tasks

Model NIILC JMMLU MGSM-ja JHumanEval
Sarashina2-7B 61.4 42.5 8.4 12.8
Sarashina2-70B 65.4 62.7 54.0 22.0
Sarashina2.2-0.5B 33.9 28.8 21.6 15.2
Sarashina2.2-1B 47.2 38.2 39.6 20.7
Sarashina2.2-3B 63.0 52.7 63.6 39.0

Ethical Considerations and Limitations

This repository contains the pre-trained model, which has not yet been tuned to follow instructions. Therefore, this model may generate meaningless sequences, inaccurate instances, or biased/objectionable outputs. As post-trained Sarashina2.2 models, we have published Sarashina2.2-0.5B-instruct-v0.1, Sarashina2.2-1B-instruct-v0.1, and Sarashina2.2-3B-instruct-v0.1.

License

MIT License

Downloads last month
945
Safetensors
Model size
3.36B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for sbintuitions/sarashina2.2-3b

Finetunes
1 model
Quantizations
2 models

Collection including sbintuitions/sarashina2.2-3b