Sarashina2-8x70B
This repository provides large language models trained by SB Intuitions.
Required Hardware
BF16 Inference:
- 16x H100
- 16x A100 80GB
Model Description
We constructed this Sarashina2-8x70B model, which consists of over 450 billion parameters, by applying the sparse upcycling technique to our Sarashina2-70B model to efficiently build the Mixture-of-Experts model. We trained the Sarashina2-8x70B model using a mix of Japanese and English corpora from web data.
Tokenization
We use a sentencepiece tokenizer with a unigram language model and byte-fallback. We do not apply pre-tokenization with Japanese tokenizer. Thus, a user may directly feed raw sentences into the tokenizer.
Ethical Considerations and Limitations
Sarashina2 has not been tuned to follow an instruction yet. Therefore, sarashina2 might generate some meaningless sequences, some inaccurate instances or biased/objectionable outputs. Before using sarashina2, we would like developers to tune models based on human preferences and safety considerations.
License
- Downloads last month
- 91