|
--- |
|
license: other |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- transformers |
|
- gguf |
|
- imatrix |
|
- Sailor2-8B-Chat |
|
--- |
|
Quantizations of https://huggingface.co/sail/Sailor2-8B-Chat |
|
|
|
### Inference Clients/UIs |
|
* [llama.cpp](https://github.com/ggerganov/llama.cpp) |
|
* [KoboldCPP](https://github.com/LostRuins/koboldcpp) |
|
* [ollama](https://github.com/ollama/ollama) |
|
* [jan](https://github.com/janhq/jan) |
|
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui) |
|
* [GPT4All](https://github.com/nomic-ai/gpt4all) |
|
--- |
|
|
|
# From original readme |
|
|
|
Sailor2 is a community-driven initiative that brings cutting-edge multilingual language models to South-East Asia (SEA). |
|
Our research highlights a strong demand for models in the **8B and 20B parameter** range for production use, alongside **1B models** for specialized applications, |
|
such as speculative decoding and research purposes. |
|
These models, released under the **Apache 2.0 license**, provide enhanced accessibility to advanced language technologies across the region. |
|
|
|
|
|
Sailor2 builds upon the foundation of the awesome multilingual model [Qwen 2.5](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and |
|
is continuously pre-trained on **500B tokens** to support **15 languages** better with a unified model. |
|
These languages include English, Chinese, Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray. |
|
By addressing the growing demand for diverse, robust, and accessible language models, Sailor2 seeks to serve the underserved in SEA areas with open, inclusive, and accessible multilingual LLMs. |
|
The Sailor2 model comes in three sizes, 1B, 8B, and 20B, which are **expanded from the Qwen2.5 base models** of 0.5B, 7B, and 14B, respectively. |
|
|
|
## Model Summary |
|
- **Model Collections:** [Base Model & Chat Model](https://huggingface.co/collections/sail/sailor2-language-models-674d7c9e6b4dbbd9a869906b) |
|
- **Project Website:** [sea-sailor.github.io/blog/sailor2/](https://sea-sailor.github.io/blog/sailor2/) |
|
- **Codebase:** [github.com/sail-sg/sailor2](https://github.com/sail-sg/sailor2) |
|
- **Technical Report:** Coming Soon |
|
|
|
|
|
## Training details |
|
|
|
During development, we employ a range of advanced technologies to ensure top-tier performance and efficiency: |
|
|
|
1. model expansion |
|
2. optimized data mixing strategies |
|
3. multi-stage pre-training protocols |
|
4. advanced multilingual post-training |
|
|
|
Please refer to [Sailor2 Blog](https://sea-sailor.github.io/blog/sailor2/) for more training details. |
|
|
|
|
|
## Requirements |
|
The code of Sailor2 has been in the latest Hugging face transformers and we advise you to install `transformers==4.46.3`. |
|
|
|
## Quickstart |
|
|
|
Here provides a code snippet to show you how to load the tokenizer and model and how to generate contents. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
device = "cuda" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
'sail/Sailor2-20B-Chat', |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('sail/Sailor2-8B-Chat') |
|
system_prompt= \ |
|
'You are an AI assistant named Sailor2, created by Sea AI Lab. \ |
|
As an AI assistant, you can answer questions in English, Chinese, and Southeast Asian languages \ |
|
such as Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray. \ |
|
Your responses should be friendly, unbiased, informative, detailed, and faithful.' |
|
|
|
prompt = "Beri saya pengenalan singkat tentang model bahasa besar." |
|
# prompt = "Hãy cho tôi một giới thiệu ngắn gọn về mô hình ngôn ngữ lớn." |
|
# prompt = "ให้ฉันแนะนำสั้น ๆ เกี่ยวกับโมเดลภาษาขนาดใหญ่" |
|
|
|
messages = [ |
|
{"role": "system", "content": system_prompt}, |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
model_inputs = tokenizer([text], return_tensors="pt").to(device) |
|
input_ids = model_inputs.input_ids.to(device) |
|
|
|
generated_ids = model.generate( |
|
input_ids, |
|
max_new_tokens=512, |
|
) |
|
|
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(response) |
|
``` |