--- license: cc-by-sa-4.0 --- # **Synatra-7B-v0.3-Translation๐Ÿง** ![Synatra-7B-v0.3-Translation](./Synatra.png) ## Support Me ์‹œ๋‚˜ํŠธ๋ผ๋Š” ๊ฐœ์ธ ํ”„๋กœ์ ํŠธ๋กœ, 1์ธ์˜ ์ž์›์œผ๋กœ ๊ฐœ๋ฐœ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ๋งˆ์Œ์— ๋“œ์…จ๋‹ค๋ฉด ์•ฝ๊ฐ„์˜ ์—ฐ๊ตฌ๋น„ ์ง€์›์€ ์–ด๋–จ๊นŒ์š”? [Buy me a Coffee](https://www.buymeacoffee.com/mwell) Wanna be a sponser? (Please) Contact me on Telegram **AlzarTakkarsen** # **License** This model is strictly [*non-commercial*](https://creativecommons.org/licenses/by-sa/4.0/) (**cc-by-sa-4.0**) use. # **Model Details** **Base Model** [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) **Datasets** [sharegpt_deepl_ko_translation](https://huggingface.co/datasets/squarelike/sharegpt_deepl_ko_translation) Filtered version of above dataset included. **Trained On** A100 80GB * 1 **Instruction format** It follows [ChatML](https://github.com/openai/openai-python/blob/main/chatml.md) format and **Alpaca(No-Input)** format. ```python <|im_start|>system ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์„ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญํ•ด๋ผ.<|im_end|> <|im_start|>user {instruction}<|im_end|> <|im_start|>assistant ``` ```python <|im_start|>system ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์„ ์˜์–ด๋กœ ๋ฒˆ์—ญํ•ด๋ผ.<|im_end|> <|im_start|>user {instruction}<|im_end|> <|im_start|>assistant ``` ## Ko-LLM-Leaderboard On Benchmarking... # **Implementation Code** Since, chat_template already contains insturction format above. You can use the code below. ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained("maywell/Synatra-7B-v0.3-Translation") tokenizer = AutoTokenizer.from_pretrained("maywell/Synatra-7B-v0.3-Translation") messages = [ {"role": "user", "content": "๋ฐ”๋‚˜๋‚˜๋Š” ์›๋ž˜ ํ•˜์–€์ƒ‰์ด์•ผ?"}, ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) decoded = tokenizer.batch_decode(generated_ids) print(decoded[0]) ```