Update README.md

6da9ca7 10 months ago

No virus

5.07 kB

	---
	license: other
	language:
	- ko
	pipeline_tag: question-answering
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This is a merged version from the trained QLoRa Adapter, [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K).

	Also the adapter was trained above the foundation model [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf).

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: [Jangmin Oh](https://huggingface.co/jangmin)
	- Model type: llama2
	- Language(s) (NLP): ko
	- License: You shoud keep the meta's llama license. Please visit: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
	- Finetuned from model: [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf)

	## Uses

	Step 1. load the model and the tokenizer.

	```python
	merged_model_hub_id = 'jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K'
	tokenizer = AutoTokenizer.from_pretrained(merged_model_hub_id)
	model = AutoModelForCausalLM.from_pretrained(merged_model_hub_id, device_map="auto", torch_dtype=torch.float16, cache_dir=cache_dir)
	```

	Step 2. prepare auxiliary tools

	```python
	instruction_prompt_template = """### 다음 주문 문장을 분석하여 음식명, 옵션명, 수량을 추출해줘.

	### 명령: {0} ### 응답:
	"""

	def generate_helper(pipeline, query):
	prompt = instruction_prompt_template.format(query)

	out = pipeline(prompt, max_new_tokens=256, do_sample=False, eos_token_id=tokenizer.eos_token_id)

	generated_text = out[0]["generated_text"][len(prompt):]

	return generated_text

	pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

	stat_dic = pd.DataFrame.from_records([generate_helper(pipe, query) for query in evaluation_queries])
	```

	Step 3. let's rock & roll.

	```python
	print(generate_helpher(pipe, "아이스아메리카노 톨사이즈 한잔 하고요. 딸기스무디 한잔 주세요. 또, 콜드브루라떼 하나요."))
	```

	## Bias, Risks, and Limitations

	Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K) for the information about Bias, Risk, and Limitations.


	## Training Details

	### Training Procedure

	Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). You can find the fine-tuning strategy.

	### Merging Procedure

	To merge the adapter on the pretrained model, I wrote following codes.

	Step 1. initialize.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig, pipeline
	from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM

	peft_model_id = "jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K"
	config = PeftConfig.from_pretrained(peft_model_id)

	IGNORE_INDEX = -100
	DEFAULT_PAD_TOKEN = "[PAD]"
	```

	Step 2. load the fine-tuned model and the tokenzer.
	```python
	device_map = "cpu"
	trained_model = AutoPeftModelForCausalLM.from_pretrained(
	peft_model_id,
	low_cpu_mem_usage=True,
	return_dict=True,
	torch_dtype=torch.float16,
	device_map=device_map,
	cache_dir=cache_dir
	)

	tokenizer = AutoTokenizer.from_pretrained(
	config.base_model_name_or_path,
	padding_side='right',
	tokenizer_type="llama",
	trust_remote_code=True,
	cache_dir=cache_dir
	)
	```

	Step 3. Modify the model and the tokenizer to treat the `PAD` token. (llama tokenizer needs to incorporate the pad token into the vocabulary. )

	```python
	def smart_tokenizer_and_embedding_resize(
	special_tokens_dict: Dict,
	tokenizer: transformers.PreTrainedTokenizer,
	model: transformers.PreTrainedModel,
	):
	"""Resize tokenizer and embedding.

	Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
	"""
	num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
	model.resize_token_embeddings(len(tokenizer))

	if num_new_tokens > 0:
	input_embeddings_data = model.get_input_embeddings().weight.data

	input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(
	dim=0, keepdim=True
	)

	input_embeddings_data[-num_new_tokens:] = input_embeddings_avg

	if with_pad_token and tokenizer._pad_token is None:
	smart_tokenizer_and_embedding_resize(
	special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
	tokenizer=tokenizer,
	model=trained_model,
	)
	trained_model.config.pad_token_id = tokenizer.pad_token_id
	```

	Step 4. merge and push to hub.

	```python
	merged_model = trained_model.merge_and_unload()

	hub_id = "jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K"

	merged_model.push_to_hub(hub_id, max_shard_size="4GB", safe_serialization=True, commit_message='recommit after pad_token was treated.')
	```