Yamase-12B / README.md
sudy-super's picture
Update README.md
c723e92 verified
|
raw
history blame
2.58 kB
---
license: apache-2.0
datasets:
- llm-jp/oasst1-21k-ja
- llm-jp/oasst2-33k-ja
- HachiML/Hachi-Alpaca
- Aratako/Rosebleu-1on1-Dialogues-RP
- baobab-trees/wikipedia-human-retrieval-ja
- aixsatoshi/Longcontext-aozora-summary
- aixsatoshi/Longcontext-aozora-instruction
- kunishou/amenokaku-code-instruct
- HachiML/Evol-hh-rlhf-gen3-1k
- Kendamarron/jimba-wiki-instruction-calm3
- Manual-Dataset-Creation-Project/Malum-130
- sudy-super/CoTangent
- minnade/chat-daily
---
# Yamase-12B
### Description
Yamase-12B-v0.1は、[Mistral-Nemo-Instruct](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)に対して日本語能力の向上を目的として約11万件のデータでFine-tuningを行ったモデルです。
### Usage
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
text = "旅行に行くと高層ビルがたくさん建っていました。これからどのようなことが推測できますか?"
model_name = "sudy-super/Yamase-12B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16,)
if torch.cuda.is_available():
model = model.to("cuda")
model.eval()
messages = [
{"role": "user", "content": text},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
with torch.no_grad():
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
output_ids = model.generate(
token_ids.to(model.device),
max_new_tokens=256,
do_sample=True,
temperature=0.3,
top_p=0.95,
top_k=50,
repetition_penalty=1.1,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=False)
print(output)
"""
"""
```
### Chat Template
```
<s>[INST]明日の東京の天気は何ですか?[/INST]晴れです。</s>[INST]大阪はどうですか?[/INST]雨です。</s>
```
### Hyperparameter
```
num_train_epochs: 5
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
gradient_accumulation_steps: 128
learning_rate: 2e-5
lr_scheduler_kwargs={"min_lr": 2e-6}
lr_scheduler_type: "cosine_with_min_lr"
warmup_ratio: 0.1
dataloader_pin_memory: True
gradient_checkpointing: True
bf16: True
optim: "adamw_torch_fused"
weight_decay: 0.0
max_grad_norm: 1.0
adam_beta2: 0.99
label_smoothing_factor: 0.0
seed: 42
```
### Author
[Rakuto Suda](https://huggingface.co/sudy-super)