--- license: apache-2.0 language: - ko pipeline_tag: question-answering --- # Model Card for Model ID This is a merged version from the trained QLoRa Adapter, [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). Also the adapter was trained above the foundation model [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf). ## Model Details ### Model Description - **Developed by:** [Jangmin Oh](https://huggingface.co/jangmin) - **Model type:** llama2 - **Language(s) (NLP):** ko - **License:** apche-2.0 - **Finetuned from model [optional]:** [meta-llama/Llama-2-7b-chat-hf](meta-llama/Llama-2-7b-chat-hf) ## Uses Step 1. load the model and the tokenizer. ```python merged_model_hub_id = 'jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K' tokenizer = AutoTokenizer.from_pretrained(merged_model_hub_id) model = AutoModelForCausalLM.from_pretrained(merged_model_hub_id, device_map="auto", torch_dtype=torch.float16, cache_dir=cache_dir) ``` Step 2. prepare auxiliary tools ```python instruction_prompt_template = """### 다음 주문 문장을 분석하여 음식명, 옵션명, 수량을 추출해줘. ### 명령: {0} ### 응답: """ def generate_helper(pipeline, query): prompt = instruction_prompt_template.format(query) out = pipeline(prompt, max_new_tokens=256, do_sample=False, eos_token_id=tokenizer.eos_token_id) generated_text = out[0]["generated_text"][len(prompt):] return generated_text pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) stat_dic = pd.DataFrame.from_records([generate_helper(pipe, query) for query in evaluation_queries]) ``` Step 3. let's rock & roll. ```python print(generate_helpher(pipe, "아이스아메리카노 톨사이즈 한잔 하고요. 딸기스무디 한잔 주세요. 또, 콜드브루라떼 하나요.")) ``` ## Bias, Risks, and Limitations Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K) for the information about Bias, Risk, and Limitations. ## Training Details ### Training Procedure Please refer [jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K](jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K). You can find the fine-tuning strategy. ### Merging Procedure To merge the adapter on the pretrained model, I wrote following codes. Step 1. initialize. ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig, pipeline from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM peft_model_id = "jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K" config = PeftConfig.from_pretrained(peft_model_id) IGNORE_INDEX = -100 DEFAULT_PAD_TOKEN = "[PAD]" ``` Step 2. load the fine-tuned model and the tokenzer. ```python device_map = "cpu" trained_model = AutoPeftModelForCausalLM.from_pretrained( peft_model_id, low_cpu_mem_usage=True, return_dict=True, torch_dtype=torch.float16, device_map=device_map, cache_dir=cache_dir ) tokenizer = AutoTokenizer.from_pretrained( config.base_model_name_or_path, padding_side='right', tokenizer_type="llama", trust_remote_code=True, cache_dir=cache_dir ) ``` Step 3. Modify the model and the tokenizer to treat the `PAD` token. (llama tokenizer needs to incorporate the pad token into the vocabulary. ) ```python def smart_tokenizer_and_embedding_resize( special_tokens_dict: Dict, tokenizer: transformers.PreTrainedTokenizer, model: transformers.PreTrainedModel, ): """Resize tokenizer and embedding. Note: This is the unoptimized version that may make your embedding size not be divisible by 64. """ num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict) model.resize_token_embeddings(len(tokenizer)) if num_new_tokens > 0: input_embeddings_data = model.get_input_embeddings().weight.data input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean( dim=0, keepdim=True ) input_embeddings_data[-num_new_tokens:] = input_embeddings_avg if with_pad_token and tokenizer._pad_token is None: smart_tokenizer_and_embedding_resize( special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN), tokenizer=tokenizer, model=trained_model, ) trained_model.config.pad_token_id = tokenizer.pad_token_id ``` Step 4. merge and push to hub. ```python merged_model = trained_model.merge_and_unload() hub_id = "jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K" merged_model.push_to_hub(hub_id, max_shard_size="4GB", safe_serialization=True, commit_message='recommit after pad_token was treated.') ```