Update README.md

11ed5cd verified 8 months ago

7.85 kB

	---
	base_model:
	- shisa-ai/shisa-v1-llama3-8b
	- aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	- meta-llama/Meta-Llama-3-8B-Instruct
	- lightblue/suzume-llama-3-8B-multilingual
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: llama3
	language:
	- ja
	---
	# Llama-3-Umievo-itr014-Shizuko-8b

	このモデルは日本語に対応しているLlama-3ベースの４つのモデルを進化的アルゴリズムで進化的マージしたものです。Meta-Llama-3-8B-Instruct、Llama-3-youko-8b-instruct-chatvector、suzume-llama-3-8B-multilingual、shisa-v1-llama3-8bの４つのモデルを使用させていただきました。
	マージに使用させていただいたモデル制作者のMeta、aixsatoshiさん、LightBlue、Shisa-AIのみなさまに感謝します。

	This model is an evolutionary merge of four Llama-3-based models for Japanese using an evolutionary algorithm: Meta-Llama-3-8B-Instruct, Llama-3-youko-8b-instruct-chatvector, suzume- llama-3-8B-multilingual, and shisa-v1-llama3-8b.
	We would like to thank the model creators Meta, aixsatoshi, LightBlue, and Shisa-AI for allowing us to use their models for the merge.

	ElyzaTasks100ベンチマークで平均点が3.85でした。（Llama3-70Bによる自動評価を３回行った平均点）

	The average score was 3.85 on the ElyzaTasks100 benchmark. (Average score after 3 automatic evaluations by Llama3-70B)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/630420b4eedc089484c853e8/x4BbxfaW_wXPjDfv1Z4lJ.png)


	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "umiyuki/Llama-3-Umievo-itr014-Shizuko-8b"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{"role": "system", "content": "You must answer all responses in Japanese.あなたは役に立つ誠実な日本人のアシスタントです。あなたは全ての回答に日本語で答えなければならない。"},
	{"role": "user", "content": "二人の少女が終末世界を旅する物語を書いてください。"},
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	terminators = [
	tokenizer.eos_token_id,
	tokenizer.convert_tokens_to_ids("<\|eot_id\|>")
	]

	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	eos_token_id=terminators,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	response = outputs[0][input_ids.shape[-1]:]
	print(tokenizer.decode(response, skip_special_tokens=True))
	```



	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	## Merge Details
	### Merge Method

	This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method using [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as a base.

	### Models Merged

	The following models were included in the merge:
	* [shisa-ai/shisa-v1-llama3-8b](https://huggingface.co/shisa-ai/shisa-v1-llama3-8b)
	* [aixsatoshi/Llama-3-youko-8b-instruct-chatvector](https://huggingface.co/aixsatoshi/Llama-3-youko-8b-instruct-chatvector)
	* [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	dtype: bfloat16
	merge_method: linear
	parameters:
	int8_mask: 1.0
	normalize: 1.0
	slices:
	- sources:
	- layer_range: [0, 4]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 0.4149739730274144
	- layer_range: [0, 4]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 0.6781276007090549
	- layer_range: [0, 4]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 0.34616999273932425
	- layer_range: [0, 4]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 1.3720042419649354
	- sources:
	- layer_range: [4, 8]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 0.07652836818139683
	- layer_range: [4, 8]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 1.234379009181979
	- layer_range: [4, 8]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 1.0146729889059811
	- layer_range: [4, 8]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 0.5811532109389872
	- sources:
	- layer_range: [8, 12]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 0.5551700273906248
	- layer_range: [8, 12]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 0.7418501521559635
	- layer_range: [8, 12]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 1.442504375594772
	- layer_range: [8, 12]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 0.6475631873316974
	- sources:
	- layer_range: [12, 16]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 0.4227647782669271
	- layer_range: [12, 16]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 1.2969869792284983
	- layer_range: [12, 16]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 0.7818773805802817
	- layer_range: [12, 16]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 0.8007371182560976
	- sources:
	- layer_range: [16, 20]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 0.10979010874744283
	- layer_range: [16, 20]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 0.19009547180175693
	- layer_range: [16, 20]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 0.6064294349661996
	- layer_range: [16, 20]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 0.7630087852386511
	- sources:
	- layer_range: [20, 24]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 0.219671192433268
	- layer_range: [20, 24]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 0.6303503074132494
	- layer_range: [20, 24]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 0.46265431269055757
	- layer_range: [20, 24]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 1.4662350856064592
	- sources:
	- layer_range: [24, 28]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 0.1400550380200451
	- layer_range: [24, 28]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 1.031570135674053
	- layer_range: [24, 28]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 0.5760956440228217
	- layer_range: [24, 28]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 1.5264012437679564
	- sources:
	- layer_range: [28, 32]
	model: lightblue/suzume-llama-3-8B-multilingual
	parameters:
	weight: 1.2311282964552015
	- layer_range: [28, 32]
	model: meta-llama/Meta-Llama-3-8B-Instruct
	parameters:
	weight: 0.43811773040605967
	- layer_range: [28, 32]
	model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
	parameters:
	weight: 0.5150682019605872
	- layer_range: [28, 32]
	model: shisa-ai/shisa-v1-llama3-8b
	parameters:
	weight: 0.342193342214983
	```

	Built with Meta Llama 3

	Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved