Sarashina2-8x70B

This repository provides large language models trained by SB Intuitions.

Required Hardware

BF16 Inference:

16x H100
16x A100 80GB

Model Description

We constructed this Sarashina2-8x70B model, which consists of over 450 billion parameters, by applying the sparse upcycling technique to our Sarashina2-70B model to efficiently build the Mixture-of-Experts model. We trained the Sarashina2-8x70B model using a mix of Japanese and English corpora from web data.

Tokenization

We use a sentencepiece tokenizer with a unigram language model and byte-fallback. We do not apply pre-tokenization with Japanese tokenizer. Thus, a user may directly feed raw sentences into the tokenizer.

Ethical Considerations and Limitations

Sarashina2 has not been tuned to follow an instruction yet. Therefore, sarashina2 might generate some meaningless sequences, some inaccurate instances or biased/objectionable outputs. Before using sarashina2, we would like developers to tune models based on human preferences and safety considerations.

License

Sarashina Model NonCommercial License Agreement

sbintuitions
/

sarashina2-8x70b