c4ai-command-r-v01-japanese-instruct

GGUF版はこちら/Click here for the GGUF version

概要

CohereForAI/c4ai-command-r-v01を、ichikara-instructionを使って追加で日本語インストラクションチューニングを施したモデルです。

学習の設定

RunpodでGPUサーバを借り、A6000x4で学習を行いました。主な学習パラメータは以下の通りです。

lora_r: 64
lisa_alpha: 128
lora_dropout: 0.05
lora_target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
learning_rate: 2e-5
num_train_epochs: 10epochs
batch_size: 50
max_seq_length: 2048

評価

jsquad(jsquad-1.1-0.3, 2-shots)、jcommonsenseqa(jcommonsenseqa-1.1-0.3, 3-shots)、jnli(jnli-1.3-0.3, 3-shots)、marc_ja(marc_ja-1.1-0.3, 3-shots)結果は以下の通りです。（8ビット量子化/jsquadは100で割り、それぞれ小数点以下第4位を四捨五入）

平均スコアの向上が見受けられます。

Model	jsquad(exact_match)	jcommonsenseqa(acc)	jnli(acc)	marc_ja(acc)	average
c4ai-command-r-v01	0.809	0.902	0.466	0.954	0.783
c4ai-command-r-v01-japanese-instruct	0.836	0.911	0.537	0.940	0.806

評価にはlm-evaluation-harnessを利用しました。

また、元モデルと本モデルのjapanese-mt-benchの結果は以下の通りです。（シングルターン、4ビット量子化）

スコア的にはあまり変わりませんでした。ただし、元モデルの出力の中には時々英語が混ざっていましたが、目視で確認した範囲だと混ざらなくなっていたので学習の効果は多少はありそうです。

Model	Coding	Extraction	Humanities	Math	Reasoning	Roleplay	STEM	Writing	avg_score
c4ai-command-r-v01	6.1	7.9	9.7	2.4	6.0	8.3	9.8	8.6	7.35
c4ai-command-r-v01-japanese-instruct	5.6	8.3	8.1	3.4	6.1	7.9	9.2	8.6	7.15

余談ですが、英語ベンチマークであるOpen LLM Leaderboardのスコアもなぜかわずかに向上していました。詳細

Metric	c4ai-command-r-v01	c4ai-command-r-v01-japanese-instruct
Avg.	68.54	68.85
AI2 Reasoning Challenge (25-Shot)	65.53	65.87
HellaSwag (10-Shot)	87	85.62
MMLU (5-Shot)	68.2	67.61
TruthfulQA (0-shot)	52.32	51.01
Winogrande (5-shot)	81.53	82.95
GSM8k (5-shot)	56.63	60.05

ライセンス

元モデルであるCohereForAI/c4ai-command-r-v01はCC-BY-NC 4.0とC4AI's Acceptable Use Policyのもとに配布されています。

また、ファインチューニングに利用したデータセットであるichikara-instructionはCC-BY-NC-SA 4.0の元配布されています。

そのため、このモデルのライセンスはCC-BY-NC-SA 4.0及びC4AI's Acceptable Use Policyとなります。（理解間違っていればご指摘ください）

Aratako
/

c4ai-command-r-v01-japanese-instruct

c4ai-command-r-v01-japanese-instruct

概要

学習の設定

評価

ライセンス

Model tree for Aratako/c4ai-command-r-v01-japanese-instruct