Edit model card

Model Card for Model ID

This is a merged version from the trained QLoRa Adapter, jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K.

Also the adapter was trained above the foundation model meta-llama/Llama-2-7b-chat-hf.

Model Details

Model Description

Uses

Step 1. load the model and the tokenizer.

merged_model_hub_id = 'jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K'
tokenizer = AutoTokenizer.from_pretrained(merged_model_hub_id)
model = AutoModelForCausalLM.from_pretrained(merged_model_hub_id, device_map="auto", torch_dtype=torch.float16, cache_dir=cache_dir) 

Step 2. prepare auxiliary tools

instruction_prompt_template = """### ๋‹ค์Œ ์ฃผ๋ฌธ ๋ฌธ์žฅ์„ ๋ถ„์„ํ•˜์—ฌ ์Œ์‹๋ช…, ์˜ต์…˜๋ช…, ์ˆ˜๋Ÿ‰์„ ์ถ”์ถœํ•ด์ค˜.

### ๋ช…๋ น: {0} ### ์‘๋‹ต:
"""

def generate_helper(pipeline, query):   
    prompt = instruction_prompt_template.format(query)

    out = pipeline(prompt, max_new_tokens=256, do_sample=False, eos_token_id=tokenizer.eos_token_id)

    generated_text = out[0]["generated_text"][len(prompt):]

    return generated_text

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

stat_dic = pd.DataFrame.from_records([generate_helper(pipe, query) for query in evaluation_queries])

Step 3. let's rock & roll.

print(generate_helpher(pipe, "์•„์ด์Šค์•„๋ฉ”๋ฆฌ์นด๋…ธ ํ†จ์‚ฌ์ด์ฆˆ ํ•œ์ž” ํ•˜๊ณ ์š”. ๋”ธ๊ธฐ์Šค๋ฌด๋”” ํ•œ์ž” ์ฃผ์„ธ์š”. ๋˜, ์ฝœ๋“œ๋ธŒ๋ฃจ๋ผ๋–ผ ํ•˜๋‚˜์š”."))

Bias, Risks, and Limitations

Please refer jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K for the information about Bias, Risk, and Limitations.

Training Details

Training Procedure

Please refer jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K. You can find the fine-tuning strategy.

Merging Procedure

To merge the adapter on the pretrained model, I wrote following codes.

Step 1. initialize.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig, pipeline
from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM

peft_model_id = "jangmin/qlora-llama2-7b-chat-hf-food-order-understanding-30K"
config = PeftConfig.from_pretrained(peft_model_id)

IGNORE_INDEX = -100
DEFAULT_PAD_TOKEN = "[PAD]"

Step 2. load the fine-tuned model and the tokenzer.

device_map = "cpu"
trained_model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
    cache_dir=cache_dir
)

tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path,     
    padding_side='right',
    tokenizer_type="llama",
    trust_remote_code=True,
    cache_dir=cache_dir
)

Step 3. Modify the model and the tokenizer to treat the PAD token. (llama tokenizer needs to incorporate the pad token into the vocabulary. )

def smart_tokenizer_and_embedding_resize(
    special_tokens_dict: Dict,
    tokenizer: transformers.PreTrainedTokenizer,
    model: transformers.PreTrainedModel,
):
    """Resize tokenizer and embedding.

    Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
    """
    num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
    model.resize_token_embeddings(len(tokenizer))

    if num_new_tokens > 0:
        input_embeddings_data = model.get_input_embeddings().weight.data

        input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(
            dim=0, keepdim=True
        )

        input_embeddings_data[-num_new_tokens:] = input_embeddings_avg

if with_pad_token and tokenizer._pad_token is None:
    smart_tokenizer_and_embedding_resize(
        special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
        tokenizer=tokenizer,
        model=trained_model,
    )
    trained_model.config.pad_token_id = tokenizer.pad_token_id

Step 4. merge and push to hub.

merged_model = trained_model.merge_and_unload()

hub_id = "jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K"

merged_model.push_to_hub(hub_id, max_shard_size="4GB", safe_serialization=True, commit_message='recommit after pad_token was treated.')
Downloads last month
5
Safetensors
Model size
6.74B params
Tensor type
FP16
ยท
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.