aixsatoshi
/

Swallow-MX-8x7b-NVE-chatvector-Mixtral-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Swallow-MX-8x7b-NVE-chatvector-Mixtral-instruct / README.md

aixsatoshi's picture

Update README.md

1f49f1d verified 7 months ago

|

history blame contribute delete

2.07 kB

	---
	license: apache-2.0
	language:
	- ja
	---
	更新情報
	日本語機能とinstructベクトルのバランス調整したver.2をアップロードしました
	[Swallow-MX-8x7b-NVE-chatvector-Mixtral-instruct-v2](https://huggingface.co/aixsatoshi/Swallow-MX-8x7b-NVE-chatvector-Mixtral-instruct-v2)

	モデル概要

	[Swallow-MX-8x7b-NVE-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1)に対し、

	[Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)と
	[Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)の差分をマージしたモデルです。

	> [Swallow-MX-8x7b-NVE-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1) + [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) - [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)


	Swallow-MX-8x7b-NVE-v0.1は、コンテクスト長4096までの日本語継続学習モデルですが、
	英語モデルのInstructベクトルをマージすることで、流暢な日本語機能を維持してコンテクスト長を32Kまで拡大、Instruct機能を大幅アップしました。

	注目すべき点
	1、4096token以上の部分は日本語継続事前学習していないが、今回の英語モデルの差分マージのみで日本語機能が32Kまで維持出来ている点
	2、英語モデルのInstruct機能が、差分マージのみで日本語モデルに簡単に移行できる点

	詳細は以下文献を参照ください。

	参考文献
	[LLM差分マージしてみた](https://zenn.dev/platina/articles/cdab4992bf39d2)
	[Chat Vector](https://arxiv.org/abs/2310.04799)
	[Chat Vectorを使って日本語LLMをチャットモデルに改造する](https://qiita.com/jovyan/items/ee6affa5ee5bdaada6b4)
	[jovyan/Swallow-MS-7b-v0.1-ChatVector](https://huggingface.co/jovyan/Swallow-MS-7b-v0.1-ChatVector)
	[kousw/stablelm-gamma-7b-chatvector](https://huggingface.co/kousw/stablelm-gamma-7b-chatvector)