|
--- |
|
license: other |
|
language: |
|
- ko |
|
pipeline_tag: question-answering |
|
--- |
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is a merged version from the trained QLoRa Adapter, [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). |
|
|
|
Also the adapter was trained above the foundation model [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf). |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** [Jangmin Oh](https://huggingface.co/jangmin) |
|
- **Model type:** llama2 |
|
- **Language(s) (NLP):** ko |
|
- **License:** You shoud keep the meta's llama license. Please visit: https://ai.meta.com/resources/models-and-libraries/llama-downloads/ |
|
- **Finetuned from model:** [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf) |
|
|
|
## Uses |
|
|
|
Step 1. load the model and the tokenizer. |
|
|
|
```python |
|
merged_model_hub_id = 'jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K' |
|
tokenizer = AutoTokenizer.from_pretrained(merged_model_hub_id) |
|
model = AutoModelForCausalLM.from_pretrained(merged_model_hub_id, device_map="auto", torch_dtype=torch.float16, cache_dir=cache_dir) |
|
``` |
|
|
|
Step 2. prepare auxiliary tools |
|
|
|
```python |
|
instruction_prompt_template = """### ๋ค์ ์ฃผ๋ฌธ ๋ฌธ์ฅ์ ๋ถ์ํ์ฌ ์์๋ช
, ์ต์
๋ช
, ์๋์ ์ถ์ถํด์ค. |
|
|
|
### ๋ช
๋ น: {0} ### ์๋ต: |
|
""" |
|
|
|
def generate_helper(pipeline, query): |
|
prompt = instruction_prompt_template.format(query) |
|
|
|
out = pipeline(prompt, max_new_tokens=256, do_sample=False, eos_token_id=tokenizer.eos_token_id) |
|
|
|
generated_text = out[0]["generated_text"][len(prompt):] |
|
|
|
return generated_text |
|
|
|
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) |
|
|
|
stat_dic = pd.DataFrame.from_records([generate_helper(pipe, query) for query in evaluation_queries]) |
|
``` |
|
|
|
Step 3. let's rock & roll. |
|
|
|
```python |
|
print(generate_helpher(pipe, "์์ด์ค์๋ฉ๋ฆฌ์นด๋
ธ ํจ์ฌ์ด์ฆ ํ์ ํ๊ณ ์. ๋ธ๊ธฐ์ค๋ฌด๋ ํ์ ์ฃผ์ธ์. ๋, ์ฝ๋๋ธ๋ฃจ๋ผ๋ผ ํ๋์.")) |
|
``` |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K) for the information about Bias, Risk, and Limitations. |
|
|
|
|
|
## Training Details |
|
|
|
### Training Procedure |
|
|
|
Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). You can find the fine-tuning strategy. |
|
|
|
### Merging Procedure |
|
|
|
To merge the adapter on the pretrained model, I wrote following codes. |
|
|
|
Step 1. initialize. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig, pipeline |
|
from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM |
|
|
|
peft_model_id = "jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K" |
|
config = PeftConfig.from_pretrained(peft_model_id) |
|
|
|
IGNORE_INDEX = -100 |
|
DEFAULT_PAD_TOKEN = "[PAD]" |
|
``` |
|
|
|
Step 2. load the fine-tuned model and the tokenzer. |
|
```python |
|
device_map = "cpu" |
|
trained_model = AutoPeftModelForCausalLM.from_pretrained( |
|
peft_model_id, |
|
low_cpu_mem_usage=True, |
|
return_dict=True, |
|
torch_dtype=torch.float16, |
|
device_map=device_map, |
|
cache_dir=cache_dir |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
config.base_model_name_or_path, |
|
padding_side='right', |
|
tokenizer_type="llama", |
|
trust_remote_code=True, |
|
cache_dir=cache_dir |
|
) |
|
``` |
|
|
|
Step 3. Modify the model and the tokenizer to treat the `PAD` token. (llama tokenizer needs to incorporate the pad token into the vocabulary. ) |
|
|
|
```python |
|
def smart_tokenizer_and_embedding_resize( |
|
special_tokens_dict: Dict, |
|
tokenizer: transformers.PreTrainedTokenizer, |
|
model: transformers.PreTrainedModel, |
|
): |
|
"""Resize tokenizer and embedding. |
|
|
|
Note: This is the unoptimized version that may make your embedding size not be divisible by 64. |
|
""" |
|
num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict) |
|
model.resize_token_embeddings(len(tokenizer)) |
|
|
|
if num_new_tokens > 0: |
|
input_embeddings_data = model.get_input_embeddings().weight.data |
|
|
|
input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean( |
|
dim=0, keepdim=True |
|
) |
|
|
|
input_embeddings_data[-num_new_tokens:] = input_embeddings_avg |
|
|
|
if with_pad_token and tokenizer._pad_token is None: |
|
smart_tokenizer_and_embedding_resize( |
|
special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN), |
|
tokenizer=tokenizer, |
|
model=trained_model, |
|
) |
|
trained_model.config.pad_token_id = tokenizer.pad_token_id |
|
``` |
|
|
|
Step 4. merge and push to hub. |
|
|
|
```python |
|
merged_model = trained_model.merge_and_unload() |
|
|
|
hub_id = "jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K" |
|
|
|
merged_model.push_to_hub(hub_id, max_shard_size="4GB", safe_serialization=True, commit_message='recommit after pad_token was treated.') |
|
``` |