mpt-7b-inst / README.md

Update README.md

07d09d3 about 1 year ago

3.95 kB

	---
	license: cc-by-sa-3.0
	tags:
	- Composer
	- MosaicML
	- llm-foundry
	- StreamingDatasets
	- mpt-7b
	datasets:
	- kunishou/databricks-dolly-15k-ja
	- Jumtra/oasst1_ja
	- Jumtra/jglue_jsquad
	- Jumtra/jglue_jsquads_with_input
	inference: false
	language:
	- ja
	---

	# MPT-7B-inst

	このモデルは、MosaicMLのllm-foundryリポジトリを使用して[mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct)をファインチューニングしたモデルです。

	## Model Date

	June 28, 2023

	## Model License

	CC-BY-SA-3.0

	## 評価

	[Jumtra/test_data_100QA](https://huggingface.co/datasets/Jumtra/test_data_100QA)を用いてモデルの正答率を評価した

	\| model name \| 正答率 \|
	\| ---- \| ---- \|
	\| mosaicml/mpt-7b \| 16/100 \|
	\| mosaicml/mpt-7b-instruct \| 28/100 \|
	\| Jumtra/mpt-7b-base \| 47/100 \|
	\| Jumtra/mpt-7b-inst \| 46/100 \|


	## 使用方法

	注意：このモデルでは、from_pretrainedメソッドにtrust_remote_code=Trueを渡す必要があります。
	これは、Hugging Faceのtransformersパッケージにはまだ含まれていないカスタムのMPTモデルアーキテクチャを使用しているためです。
	MPTには、FlashAttention、ALiBi、QK LayerNormなど、多くのトレーニング効率化機能のオプションが含まれています。

	```python
	# 使用したプロンプトフォーマット
	INSTRUCTION_KEY = "### Instruction:"
	RESPONSE_KEY = "### Response:"
	INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
	PROMPT_FOR_GENERATION_FORMAT = """{intro}
	{instruction_key}
	{instruction}
	{response_key}
	""".format(
	intro=INTRO_BLURB,
	instruction_key=INSTRUCTION_KEY,
	instruction="{instruction}",
	response_key=RESPONSE_KEY,
	)
	```


	```python
	import torch
	import transformers
	name = 'Jumtra/mpt-7b-inst'
	config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
	config.attn_config['attn_impl'] = 'torch'
	config.init_device = 'cuda:0' # For fast initialization directly on GPU!
	model = transformers.AutoModelForCausalLM.from_pretrained(
	name,
	config=config,
	torch_dtype=torch.bfloat16, # Load model weights in bfloat16
	trust_remote_code=True
	).to("cuda:0")
	model.eval()

	input_text = PROMPT_FOR_GENERATION_FORMAT.format(instruction = "ニューラルネットワークとは何ですか？")

	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
	input_length = inputs.input_ids.shape[1]

	# Without streaming
	with torch.no_grad():
	generation_output = model.generate(
	**inputs,
	max_new_tokens=2048,
	do_sample=True,
	temperature=0.01,
	top_p=0.01,
	top_k=60,
	repetition_penalty=1.1,
	return_dict_in_generate=True,
	remove_invalid_values=True,
	pad_token_id=tokenizer.pad_token_id,
	bos_token_id=tokenizer.bos_token_id,
	eos_token_id=tokenizer.eos_token_id,
	)
	token = generation_output.sequences[0, input_length:]
	output = tokenizer.decode(token)
	print(output)

	#ニューラルネットワーク（NN）は、人工知能の分野で使用される深い学習アルゴリズムの一種です。これらのアルゴリズムは、データを使って自動的に学習し、特定の目的を達成するために予測や決定を行うことができます。ニューラルネットワークは、多くの異なるアプリケーションで使用されており、自動車の運転システム、検索エンジン、画像認識などです。<\|endoftext\|>
	```

	## 引用

	```
	@online{MosaicML2023Introducing,
	author = {MosaicML NLP Team},
	title = {Introducing MPT-7B: A New Standard for Open-Source,
	ly Usable LLMs},
	year = {2023},
	url = {www.mosaicml.com/blog/mpt-7b},
	note = {Accessed: 2023-03-28}, % change this date
	urldate = {2023-03-28} % change this date
	}
	```