jangmin's picture
Update README.md
6da9ca7
---
license: other
language:
- ko
pipeline_tag: question-answering
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This is a merged version from the trained QLoRa Adapter, [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K).
Also the adapter was trained above the foundation model [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf).
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [Jangmin Oh](https://huggingface.co/jangmin)
- **Model type:** llama2
- **Language(s) (NLP):** ko
- **License:** You shoud keep the meta's llama license. Please visit: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
- **Finetuned from model:** [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf)
## Uses
Step 1. load the model and the tokenizer.
```python
merged_model_hub_id = 'jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K'
tokenizer = AutoTokenizer.from_pretrained(merged_model_hub_id)
model = AutoModelForCausalLM.from_pretrained(merged_model_hub_id, device_map="auto", torch_dtype=torch.float16, cache_dir=cache_dir)
```
Step 2. prepare auxiliary tools
```python
instruction_prompt_template = """### ๋‹ค์Œ ์ฃผ๋ฌธ ๋ฌธ์žฅ์„ ๋ถ„์„ํ•˜์—ฌ ์Œ์‹๋ช…, ์˜ต์…˜๋ช…, ์ˆ˜๋Ÿ‰์„ ์ถ”์ถœํ•ด์ค˜.
### ๋ช…๋ น: {0} ### ์‘๋‹ต:
"""
def generate_helper(pipeline, query):
prompt = instruction_prompt_template.format(query)
out = pipeline(prompt, max_new_tokens=256, do_sample=False, eos_token_id=tokenizer.eos_token_id)
generated_text = out[0]["generated_text"][len(prompt):]
return generated_text
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
stat_dic = pd.DataFrame.from_records([generate_helper(pipe, query) for query in evaluation_queries])
```
Step 3. let's rock & roll.
```python
print(generate_helpher(pipe, "์•„์ด์Šค์•„๋ฉ”๋ฆฌ์นด๋…ธ ํ†จ์‚ฌ์ด์ฆˆ ํ•œ์ž” ํ•˜๊ณ ์š”. ๋”ธ๊ธฐ์Šค๋ฌด๋”” ํ•œ์ž” ์ฃผ์„ธ์š”. ๋˜, ์ฝœ๋“œ๋ธŒ๋ฃจ๋ผ๋–ผ ํ•˜๋‚˜์š”."))
```
## Bias, Risks, and Limitations
Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K) for the information about Bias, Risk, and Limitations.
## Training Details
### Training Procedure
Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). You can find the fine-tuning strategy.
### Merging Procedure
To merge the adapter on the pretrained model, I wrote following codes.
Step 1. initialize.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig, pipeline
from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM
peft_model_id = "jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K"
config = PeftConfig.from_pretrained(peft_model_id)
IGNORE_INDEX = -100
DEFAULT_PAD_TOKEN = "[PAD]"
```
Step 2. load the fine-tuned model and the tokenzer.
```python
device_map = "cpu"
trained_model = AutoPeftModelForCausalLM.from_pretrained(
peft_model_id,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map=device_map,
cache_dir=cache_dir
)
tokenizer = AutoTokenizer.from_pretrained(
config.base_model_name_or_path,
padding_side='right',
tokenizer_type="llama",
trust_remote_code=True,
cache_dir=cache_dir
)
```
Step 3. Modify the model and the tokenizer to treat the `PAD` token. (llama tokenizer needs to incorporate the pad token into the vocabulary. )
```python
def smart_tokenizer_and_embedding_resize(
special_tokens_dict: Dict,
tokenizer: transformers.PreTrainedTokenizer,
model: transformers.PreTrainedModel,
):
"""Resize tokenizer and embedding.
Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
"""
num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
model.resize_token_embeddings(len(tokenizer))
if num_new_tokens > 0:
input_embeddings_data = model.get_input_embeddings().weight.data
input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(
dim=0, keepdim=True
)
input_embeddings_data[-num_new_tokens:] = input_embeddings_avg
if with_pad_token and tokenizer._pad_token is None:
smart_tokenizer_and_embedding_resize(
special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
tokenizer=tokenizer,
model=trained_model,
)
trained_model.config.pad_token_id = tokenizer.pad_token_id
```
Step 4. merge and push to hub.
```python
merged_model = trained_model.merge_and_unload()
hub_id = "jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K"
merged_model.push_to_hub(hub_id, max_shard_size="4GB", safe_serialization=True, commit_message='recommit after pad_token was treated.')
```