File size: 5,069 Bytes

2d11f2c
617cb18
2d11f2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adf2159
2d11f2c
617cb18
6da9ca7
2d11f2c
 
 
adf2159
2d11f2c
adf2159
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
 
 
 
 
 
 
 
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
2d11f2c
 
 
adf2159
2d11f2c
 
 
 
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
2d11f2c
adf2159
 
 
2d11f2c
adf2159
 
2d11f2c
adf2159
 
 
2d11f2c
adf2159
 
 
 
 
 
 
 
 
 
 
2d11f2c
adf2159
 
 
 
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
 
 
 
 
2d11f2c
adf2159
 
 
 
2d11f2c
adf2159
 
2d11f2c
adf2159
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
 
 
 
 
 
 
2d11f2c
adf2159
2d11f2c
adf2159
 
2d11f2c
adf2159
2d11f2c
adf2159
617cb18

---
license: other
language:
- ko
pipeline_tag: question-answering
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This is a merged version from the trained QLoRa Adapter, [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K).

Also the adapter was trained above the foundation model [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf). 

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [Jangmin Oh](https://huggingface.co/jangmin)
- **Model type:** llama2
- **Language(s) (NLP):** ko
- **License:** You shoud keep the meta's llama license. Please visit: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
- **Finetuned from model:** [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf)

## Uses

Step 1. load the model and the tokenizer.

  ```python
  merged_model_hub_id = 'jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K'
  tokenizer = AutoTokenizer.from_pretrained(merged_model_hub_id)
  model = AutoModelForCausalLM.from_pretrained(merged_model_hub_id, device_map="auto", torch_dtype=torch.float16, cache_dir=cache_dir) 
  ```

Step 2. prepare auxiliary tools

  ```python
  instruction_prompt_template = """### 다음 주문 문장을 분석하여 음식명, 옵션명, 수량을 추출해줘.
  
  ### 명령: {0} ### 응답:
  """
  
  def generate_helper(pipeline, query):   
      prompt = instruction_prompt_template.format(query)
  
      out = pipeline(prompt, max_new_tokens=256, do_sample=False, eos_token_id=tokenizer.eos_token_id)
  
      generated_text = out[0]["generated_text"][len(prompt):]
  
      return generated_text

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

stat_dic = pd.DataFrame.from_records([generate_helper(pipe, query) for query in evaluation_queries])
  ```

Step 3. let's rock & roll.

  ```python
  print(generate_helpher(pipe, "아이스아메리카노 톨사이즈 한잔 하고요. 딸기스무디 한잔 주세요. 또, 콜드브루라떼 하나요."))
  ```

## Bias, Risks, and Limitations

Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K) for the information about Bias, Risk, and Limitations.


## Training Details

### Training Procedure

Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). You can find the fine-tuning strategy.

### Merging Procedure

To merge the adapter on the pretrained model, I wrote following codes.

Step 1. initialize.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig, pipeline
from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM

peft_model_id = "jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K"
config = PeftConfig.from_pretrained(peft_model_id)

IGNORE_INDEX = -100
DEFAULT_PAD_TOKEN = "[PAD]"
```

Step 2. load the fine-tuned model and the tokenzer.
```python
device_map = "cpu"
trained_model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
    cache_dir=cache_dir
)

tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path,     
    padding_side='right',
    tokenizer_type="llama",
    trust_remote_code=True,
    cache_dir=cache_dir
)
```

Step 3. Modify the model and the tokenizer to treat the `PAD` token. (llama tokenizer needs to incorporate the pad token into the vocabulary. )

```python
def smart_tokenizer_and_embedding_resize(
    special_tokens_dict: Dict,
    tokenizer: transformers.PreTrainedTokenizer,
    model: transformers.PreTrainedModel,
):
    """Resize tokenizer and embedding.

    Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
    """
    num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
    model.resize_token_embeddings(len(tokenizer))

    if num_new_tokens > 0:
        input_embeddings_data = model.get_input_embeddings().weight.data

        input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(
            dim=0, keepdim=True
        )

        input_embeddings_data[-num_new_tokens:] = input_embeddings_avg

if with_pad_token and tokenizer._pad_token is None:
    smart_tokenizer_and_embedding_resize(
        special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
        tokenizer=tokenizer,
        model=trained_model,
    )
    trained_model.config.pad_token_id = tokenizer.pad_token_id
```

Step 4. merge and push to hub.

```python
merged_model = trained_model.merge_and_unload()

hub_id = "jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K"

merged_model.push_to_hub(hub_id, max_shard_size="4GB", safe_serialization=True, commit_message='recommit after pad_token was treated.')
```