File size: 5,069 Bytes
2d11f2c
617cb18
2d11f2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adf2159
2d11f2c
617cb18
6da9ca7
2d11f2c
 
 
adf2159
2d11f2c
adf2159
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
 
 
 
 
 
 
 
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
2d11f2c
 
 
adf2159
2d11f2c
 
 
 
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
 
 
2d11f2c
adf2159
 
2d11f2c
adf2159
 
 
2d11f2c
adf2159
 
 
 
 
 
 
 
 
 
 
2d11f2c
adf2159
 
 
 
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
 
 
 
 
2d11f2c
adf2159
 
 
 
2d11f2c
adf2159
 
2d11f2c
adf2159
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
 
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
2d11f2c
adf2159
2d11f2c
adf2159
617cb18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: other
language:
- ko
pipeline_tag: question-answering
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This is a merged version from the trained QLoRa Adapter, [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K).

Also the adapter was trained above the foundation model [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf). 

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [Jangmin Oh](https://huggingface.co/jangmin)
- **Model type:** llama2
- **Language(s) (NLP):** ko
- **License:** You shoud keep the meta's llama license. Please visit: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
- **Finetuned from model:** [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf)

## Uses

Step 1. load the model and the tokenizer.

  ```python
  merged_model_hub_id = 'jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K'
  tokenizer = AutoTokenizer.from_pretrained(merged_model_hub_id)
  model = AutoModelForCausalLM.from_pretrained(merged_model_hub_id, device_map="auto", torch_dtype=torch.float16, cache_dir=cache_dir) 
  ```

Step 2. prepare auxiliary tools

  ```python
  instruction_prompt_template = """### ๋‹ค์Œ ์ฃผ๋ฌธ ๋ฌธ์žฅ์„ ๋ถ„์„ํ•˜์—ฌ ์Œ์‹๋ช…, ์˜ต์…˜๋ช…, ์ˆ˜๋Ÿ‰์„ ์ถ”์ถœํ•ด์ค˜.
  
  ### ๋ช…๋ น: {0} ### ์‘๋‹ต:
  """
  
  def generate_helper(pipeline, query):   
      prompt = instruction_prompt_template.format(query)
  
      out = pipeline(prompt, max_new_tokens=256, do_sample=False, eos_token_id=tokenizer.eos_token_id)
  
      generated_text = out[0]["generated_text"][len(prompt):]
  
      return generated_text

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

stat_dic = pd.DataFrame.from_records([generate_helper(pipe, query) for query in evaluation_queries])
  ```

Step 3. let's rock & roll.

  ```python
  print(generate_helpher(pipe, "์•„์ด์Šค์•„๋ฉ”๋ฆฌ์นด๋…ธ ํ†จ์‚ฌ์ด์ฆˆ ํ•œ์ž” ํ•˜๊ณ ์š”. ๋”ธ๊ธฐ์Šค๋ฌด๋”” ํ•œ์ž” ์ฃผ์„ธ์š”. ๋˜, ์ฝœ๋“œ๋ธŒ๋ฃจ๋ผ๋–ผ ํ•˜๋‚˜์š”."))
  ```

## Bias, Risks, and Limitations

Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K) for the information about Bias, Risk, and Limitations.


## Training Details

### Training Procedure

Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). You can find the fine-tuning strategy.

### Merging Procedure

To merge the adapter on the pretrained model, I wrote following codes.

Step 1. initialize.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig, pipeline
from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM

peft_model_id = "jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K"
config = PeftConfig.from_pretrained(peft_model_id)

IGNORE_INDEX = -100
DEFAULT_PAD_TOKEN = "[PAD]"
```

Step 2. load the fine-tuned model and the tokenzer.
```python
device_map = "cpu"
trained_model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
    cache_dir=cache_dir
)

tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path,     
    padding_side='right',
    tokenizer_type="llama",
    trust_remote_code=True,
    cache_dir=cache_dir
)
```

Step 3. Modify the model and the tokenizer to treat the `PAD` token. (llama tokenizer needs to incorporate the pad token into the vocabulary. )

```python
def smart_tokenizer_and_embedding_resize(
    special_tokens_dict: Dict,
    tokenizer: transformers.PreTrainedTokenizer,
    model: transformers.PreTrainedModel,
):
    """Resize tokenizer and embedding.

    Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
    """
    num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
    model.resize_token_embeddings(len(tokenizer))

    if num_new_tokens > 0:
        input_embeddings_data = model.get_input_embeddings().weight.data

        input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(
            dim=0, keepdim=True
        )

        input_embeddings_data[-num_new_tokens:] = input_embeddings_avg

if with_pad_token and tokenizer._pad_token is None:
    smart_tokenizer_and_embedding_resize(
        special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
        tokenizer=tokenizer,
        model=trained_model,
    )
    trained_model.config.pad_token_id = tokenizer.pad_token_id
```

Step 4. merge and push to hub.

```python
merged_model = trained_model.merge_and_unload()

hub_id = "jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K"

merged_model.push_to_hub(hub_id, max_shard_size="4GB", safe_serialization=True, commit_message='recommit after pad_token was treated.')
```