leejunhyeok's picture
minor fix in table
a7f7f83 verified
|
raw
history blame
3.47 kB
metadata
license: mit

image/png

Introduction

We introduce Motif, a new language model family of Moreh, specialized in Korean and English.
Motif-102B-Instruct is a chat model tuned from the base model Motif-102B.

Training Platform

  • Motif-102B is trained on MoAI platform, with AMD's MI250 GPU.
  • The MoAI platform simplifies scalable, cost efficient training of large-scale models across multiple nodes.
  • The MoAI platform also supports various optimized and automated parallelization without any complex manual works.
  • One can find more information on the MoAI Platform in https://moreh.io/product
  • Or, contact us directly contact@moreh.io

Quick Usage

You can chat directly with our model Motif through our Model hub.

Details

More details will be provided in the upcoming technical report.

Release Date

2024.09.30

Benchmark Results

Model KMMLU
GPT-4-base-0613** 57.62
Llama3.1-70B-instruct * 52.1
Motif-102B **+ 58.25
Motif-102B-Instruct **+ 57.98

β€˜*’ : Community reported
β€˜**’ : Measured by the authors
β€˜+’ : Indicates the model is specialized in Korean

How to use

Use with vLLM

  • Minimum requirements: 4xA100 80GB GPUs
  • Refer to this link to install vllm
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# for minimum, we recommand using 4x A100 80GB GPUs for inference with vllm
# If you have more GPUs, change tensor parallel size to GPU numbers you can afford
model = LLM("moreh/Motif-100B-Instruct", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Motif-100B-Instruct")
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "μœ μΉ˜μ›μƒμ—κ²Œ λΉ…λ±… 이둠의 κ°œλ…μ„ μ„€λͺ…ν•΄λ³΄μ„Έμš”"},
]

messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]

# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)

print(responses[0].outputs[0].text)

Use with transformers

  • Minimum requirements: 4xA100 80GB GPUs OR 4xAMD MI250 GPUs
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "moreh/Motif-100B-Instruct"

# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "μœ μΉ˜μ›μƒμ—κ²Œ λΉ…λ±… 이둠의 κ°œλ…μ„ μ„€λͺ…ν•΄λ³΄μ„Έμš”"},
]

messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()

outputs = model.generate(input_ids)