---
language:
- ko
- en
library_name: transformers
base_model:
- moreh/Llama-3-Motif-102B
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0c845a04a514ba62bcd1a/RFpsPxlc_3cK0kmWj-tYR.png)

# **Introduction**  
We introduce Llama-3-Motif, a new language model family of [**Moreh**](https://moreh.io/), specialized in Korean and English.\
Llama-3-Motif-102B-Instruct is a chat model tuned from the base model [Llama-3-Motif-102B](https://huggingface.co/moreh/Motif-102B).

## Training Platform
- Llama-3-Motif-102B model family is trained on [**MoAI platform**](https://moreh.io/product), refer to link for more information.  

## Quick Usage  
You can chat directly with our model Llama-3-Motif through our [Model hub](https://model-hub.moreh.io/).

## Details
More details will be provided in the upcoming technical report.  
Effective context length is 32k(avg 81) based on [RULER](https://github.com/NVIDIA/RULER) benchmark.  

### Release Date  
2024.12.02

### Benchmark Results
    
|Provider|Model|kmmlu_direct score||
|---|---|---|---|
|Moreh|Llama-3-Motif-102B|64.74|+|
|Moreh|**Llama-3-Motif-102B-Instruct**|**64.81**|+|
|Meta|Llama3-70B-instruct|54.5*||
|Meta|Llama3.1-70B-instruct|52.1*||
|Meta|Llama3.1-405B-instruct|65.8*||
|Alibaba|Qwen2-72B-instruct|64.1*||
|OpenAI|GPT-4-0125-preview|59.95*||
|OpenAI|GPT-4o-2024-05-13|64.11**||
|Google|gemini pro|50.18*||
|LG|exaone 3.0|44.5*|+|
|Naver|HyperCLOVA X|53.4*|+|
|Upstage|SOLAR-10.7B|41.65*|+|

\* : Community report  
\*\* : Measured by Moreh  
\+ : Claimed to have better capability in Korean  


## How to use

### Use with vLLM
- Refer to this [link](https://github.com/vllm-project/vllm) to install vllm  
```python
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# Change tensor_parallel_size to GPU numbers you can afford
model = LLM("moreh/Motif-102B-Instruct", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B-Instruct")
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
]

messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]

# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)

print(responses[0].outputs[0].text)
```

### Use with transformers  
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "moreh/Llama-3-Motif-102B-Instruct"

# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
]

messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()

outputs = model.generate(input_ids)
```