--- license: mit --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0c845a04a514ba62bcd1a/RFpsPxlc_3cK0kmWj-tYR.png) # **Introduction** We introduce Motif, a new language model family of [**Moreh**](https://moreh.io/), specialized in Korean and English.\ Motif-102B-Instruct is a chat model tuned from the base model [Motif-102B](https://huggingface.co/moreh/Motif-102B). ## Training Platform - Motif-102B is trained on [**MoAI platform**](https://moreh.io/product), with AMD's MI250 GPU. - The MoAI platform simplifies scalable, cost efficient training of large-scale models across multiple nodes. - The MoAI platform also supports various optimized and automated parallelization without any complex manual works. - One can find more information on the MoAI Platform in https://moreh.io/product - Or, contact us directly [contact@moreh.io](mailto:contact@moreh.io) ## Quick Usage You can chat directly with our model Motif through our [Model hub](https://model-hub.moreh.io/). ## Details More details will be provided in the upcoming technical report. ### Release Date 2024.09.30 ### Benchmark Results | Model | KMMLU | |------------------------------|-------| | GPT-4-base-0613\**| 57.62 | | Llama3.1-70B-instruct *| 52.1 | | **Motif-102B** \**+| 58.25 | | Motif-102B-Instruct \**+| 57.98 | ‘*’ : Community reported ‘**’ : Measured by the authors ‘+’ : Indicates the model is specialized in Korean ## How to use ### Use with vLLM - Minimum requirements: 4xA100 80GB GPUs - Refer to this [link](https://github.com/vllm-project/vllm) to install vllm ```python from transformers import AutoTokenizer from vllm import LLM, SamplingParams # for minimum, we recommand using 4x A100 80GB GPUs for inference with vllm # If you have more GPUs, change tensor parallel size to GPU numbers you can afford model = LLM("moreh/Motif-100B-Instruct", tensor_parallel_size=4) tokenizer = AutoTokenizer.from_pretrained("moreh/Motif-100B-Instruct") messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"}, ] messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)] # vllm does not support generation_config of hf. So we have to set it like below sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id]) responses = model.generate(messages_batch, sampling_params=sampling_params) print(responses[0].outputs[0].text) ``` ### Use with transformers - Minimum requirements: 4xA100 80GB GPUs OR 4xAMD MI250 GPUs ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "moreh/Motif-100B-Instruct" # all generation configs are set in generation_configs.json model = AutoModelForCausalLM.from_pretrained(model_id).cuda() tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"}, ] messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False) input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda() outputs = model.generate(input_ids) ```