|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- llm-jp/oasst1-21k-ja |
|
- llm-jp/oasst2-33k-ja |
|
- HachiML/Hachi-Alpaca |
|
- Aratako/Rosebleu-1on1-Dialogues-RP |
|
- baobab-trees/wikipedia-human-retrieval-ja |
|
- aixsatoshi/Longcontext-aozora-summary |
|
- aixsatoshi/Longcontext-aozora-instruction |
|
- kunishou/amenokaku-code-instruct |
|
- HachiML/Evol-hh-rlhf-gen3-1k |
|
- Kendamarron/jimba-wiki-instruction-calm3 |
|
- Manual-Dataset-Creation-Project/Malum-130 |
|
- sudy-super/CoTangent |
|
- minnade/chat-daily |
|
--- |
|
# Yamase-12B |
|
### Description |
|
Yamase-12B-v0.1は、[Mistral-Nemo-Instruct](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)に対して日本語能力の向上を目的として約11万件のデータでFine-tuningを行ったモデルです。 |
|
|
|
### Usage |
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
text = "旅行に行くと高層ビルがたくさん建っていました。これからどのようなことが推測できますか?" |
|
model_name = "sudy-super/Yamase-12B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16,) |
|
if torch.cuda.is_available(): |
|
model = model.to("cuda") |
|
model.eval() |
|
messages = [ |
|
{"role": "user", "content": text}, |
|
] |
|
prompt = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True, |
|
) |
|
with torch.no_grad(): |
|
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") |
|
output_ids = model.generate( |
|
token_ids.to(model.device), |
|
max_new_tokens=256, |
|
do_sample=True, |
|
temperature=0.3, |
|
top_p=0.95, |
|
top_k=50, |
|
repetition_penalty=1.1, |
|
pad_token_id=tokenizer.pad_token_id, |
|
eos_token_id=tokenizer.eos_token_id, |
|
) |
|
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=False) |
|
print(output) |
|
""" |
|
|
|
""" |
|
``` |
|
|
|
### Chat Template |
|
``` |
|
<s>[INST]明日の東京の天気は何ですか?[/INST]晴れです。</s>[INST]大阪はどうですか?[/INST]雨です。</s> |
|
``` |
|
|
|
|
|
### Hyperparameter |
|
``` |
|
num_train_epochs: 5 |
|
per_device_train_batch_size: 2 |
|
per_device_eval_batch_size: 2 |
|
gradient_accumulation_steps: 128 |
|
learning_rate: 2e-5 |
|
lr_scheduler_kwargs={"min_lr": 2e-6} |
|
lr_scheduler_type: "cosine_with_min_lr" |
|
warmup_ratio: 0.1 |
|
dataloader_pin_memory: True |
|
gradient_checkpointing: True |
|
bf16: True |
|
optim: "adamw_torch_fused" |
|
weight_decay: 0.0 |
|
max_grad_norm: 1.0 |
|
adam_beta2: 0.99 |
|
label_smoothing_factor: 0.0 |
|
seed: 42 |
|
``` |
|
|
|
### Author |
|
[Rakuto Suda](https://huggingface.co/sudy-super) |