|
--- |
|
base_model: |
|
- shisa-ai/shisa-v1-llama3-8b |
|
- aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
- meta-llama/Meta-Llama-3-8B-Instruct |
|
- lightblue/suzume-llama-3-8B-multilingual |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
license: llama3 |
|
language: |
|
- ja |
|
--- |
|
# Llama-3-Umievo-itr014-Shizuko-8b |
|
|
|
このモデルは日本語に対応しているLlama-3ベースの4つのモデルを進化的アルゴリズムで進化的マージしたものです。Meta-Llama-3-8B-Instruct、Llama-3-youko-8b-instruct-chatvector、suzume-llama-3-8B-multilingual、shisa-v1-llama3-8bの4つのモデルを使用させていただきました。 |
|
マージに使用させていただいたモデル制作者のMeta、aixsatoshiさん、LightBlue、Shisa-AIのみなさまに感謝します。 |
|
|
|
This model is an evolutionary merge of four Llama-3-based models for Japanese using an evolutionary algorithm: Meta-Llama-3-8B-Instruct, Llama-3-youko-8b-instruct-chatvector, suzume- llama-3-8B-multilingual, and shisa-v1-llama3-8b. |
|
We would like to thank the model creators Meta, aixsatoshi, LightBlue, and Shisa-AI for allowing us to use their models for the merge. |
|
|
|
ElyzaTasks100ベンチマークで平均点が3.85でした。(Llama3-70Bによる自動評価を3回行った平均点) |
|
|
|
The average score was 3.85 on the ElyzaTasks100 benchmark. (Average score after 3 automatic evaluations by Llama3-70B) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630420b4eedc089484c853e8/x4BbxfaW_wXPjDfv1Z4lJ.png) |
|
|
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
model_id = "umiyuki/Llama-3-Umievo-itr014-Shizuko-8b" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
|
|
messages = [ |
|
{"role": "system", "content": "You must answer all responses in Japanese.あなたは役に立つ誠実な日本人のアシスタントです。あなたは全ての回答に日本語で答えなければならない。"}, |
|
{"role": "user", "content": "二人の少女が終末世界を旅する物語を書いてください。"}, |
|
] |
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
).to(model.device) |
|
|
|
terminators = [ |
|
tokenizer.eos_token_id, |
|
tokenizer.convert_tokens_to_ids("<|eot_id|>") |
|
] |
|
|
|
outputs = model.generate( |
|
input_ids, |
|
max_new_tokens=256, |
|
eos_token_id=terminators, |
|
do_sample=True, |
|
temperature=0.6, |
|
top_p=0.9, |
|
) |
|
response = outputs[0][input_ids.shape[-1]:] |
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
|
|
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
|
## Merge Details |
|
### Merge Method |
|
|
|
This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method using [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as a base. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [shisa-ai/shisa-v1-llama3-8b](https://huggingface.co/shisa-ai/shisa-v1-llama3-8b) |
|
* [aixsatoshi/Llama-3-youko-8b-instruct-chatvector](https://huggingface.co/aixsatoshi/Llama-3-youko-8b-instruct-chatvector) |
|
* [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) |
|
|
|
### Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
base_model: meta-llama/Meta-Llama-3-8B-Instruct |
|
dtype: bfloat16 |
|
merge_method: linear |
|
parameters: |
|
int8_mask: 1.0 |
|
normalize: 1.0 |
|
slices: |
|
- sources: |
|
- layer_range: [0, 4] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 0.4149739730274144 |
|
- layer_range: [0, 4] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 0.6781276007090549 |
|
- layer_range: [0, 4] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 0.34616999273932425 |
|
- layer_range: [0, 4] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 1.3720042419649354 |
|
- sources: |
|
- layer_range: [4, 8] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 0.07652836818139683 |
|
- layer_range: [4, 8] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 1.234379009181979 |
|
- layer_range: [4, 8] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 1.0146729889059811 |
|
- layer_range: [4, 8] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 0.5811532109389872 |
|
- sources: |
|
- layer_range: [8, 12] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 0.5551700273906248 |
|
- layer_range: [8, 12] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 0.7418501521559635 |
|
- layer_range: [8, 12] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 1.442504375594772 |
|
- layer_range: [8, 12] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 0.6475631873316974 |
|
- sources: |
|
- layer_range: [12, 16] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 0.4227647782669271 |
|
- layer_range: [12, 16] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 1.2969869792284983 |
|
- layer_range: [12, 16] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 0.7818773805802817 |
|
- layer_range: [12, 16] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 0.8007371182560976 |
|
- sources: |
|
- layer_range: [16, 20] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 0.10979010874744283 |
|
- layer_range: [16, 20] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 0.19009547180175693 |
|
- layer_range: [16, 20] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 0.6064294349661996 |
|
- layer_range: [16, 20] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 0.7630087852386511 |
|
- sources: |
|
- layer_range: [20, 24] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 0.219671192433268 |
|
- layer_range: [20, 24] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 0.6303503074132494 |
|
- layer_range: [20, 24] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 0.46265431269055757 |
|
- layer_range: [20, 24] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 1.4662350856064592 |
|
- sources: |
|
- layer_range: [24, 28] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 0.1400550380200451 |
|
- layer_range: [24, 28] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 1.031570135674053 |
|
- layer_range: [24, 28] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 0.5760956440228217 |
|
- layer_range: [24, 28] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 1.5264012437679564 |
|
- sources: |
|
- layer_range: [28, 32] |
|
model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 1.2311282964552015 |
|
- layer_range: [28, 32] |
|
model: meta-llama/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
weight: 0.43811773040605967 |
|
- layer_range: [28, 32] |
|
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector |
|
parameters: |
|
weight: 0.5150682019605872 |
|
- layer_range: [28, 32] |
|
model: shisa-ai/shisa-v1-llama3-8b |
|
parameters: |
|
weight: 0.342193342214983 |
|
``` |
|
|
|
Built with Meta Llama 3 |
|
|
|
Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved |