Llama-3.1-Swallow-70B-Nemotron-Instruct-v0.1

Llama-3.1-Swallow-70B-Nemotron-Instruct-v0.1 is a merge of the following models using mergekit:

Llama-3.1-Nemotron-70Bに日本の知識を加える目的で作成した実験的なモデルです。それなりに良い出力が出ますが、<|eot_id|> が出力されないという問題があります（代わりのeosとして「assistant」(ID:78191)の設定が必要です）。安定性が必要な場合は nvidia/Llama-3.1-Nemotron-70B-Instruct-HF や tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.1を用いてください。
Nemotron、Swallowの開発者の方々に深くお礼申し上げます。

Usage:

pip install -U transformersを実行し、最新バージョンにアップデートしてください。（作成時：4.45.2）

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "misdelivery/Llama-3.1-Swallow-70B-Nemotron-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many r in strawberry?"
messages = [{"role": "user", "content": prompt}]

tokenized_message = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True)
response_token_ids = model.generate(tokenized_message['input_ids'].cuda(),attention_mask=tokenized_message['attention_mask'].cuda(),  max_new_tokens=1024, eos_token_id=78191, pad_token_id = tokenizer.eos_token_id)
generated_tokens =response_token_ids[:, len(tokenized_message['input_ids'][0]):]
generated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
print(generated_text)

🧩 Configuration

merge_method: task_arithmetic
base_model: meta-llama/Llama-3.1-70B
models:
   - model: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
     parameters:
        weight: 1.0
   - model: tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
     parameters:
        weight: 1.0
dtype: bfloat16

misdelivery
/

Llama-3.1-Swallow-70B-Nemotron-Instruct-v0.1

Llama-3.1-Swallow-70B-Nemotron-Instruct-v0.1

Usage:

🧩 Configuration

Model tree for misdelivery/Llama-3.1-Swallow-70B-Nemotron-Instruct-v0.1