Model Description
Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware. Model Architecture
Base Model: Llama-3.2-1B
Parameters: Approximately 1 Billion
Quantization: 4-bit using the bitsandbytes library
Fine-tuning Method: PEFT with LoRA
Training Data
The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset.
Training Procedure
Fine-tuning Configuration:
LoRA Rank (r): 8
LoRA Alpha: 16
LoRA Dropout: 0.5
Number of Epochs: 30
Batch Size: 2 (per device)
Learning Rate: 2e-5
Evaluation Strategy: Evaluated at each epoch
Optimizer: AdamW
Mixed Precision: FP16
Hardware Used: Single RTX 4070 8GB
Libraries:
transformers
datasets
peft
bitsandbytes
trl
evaluate
Intended Use
The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems.
Limitations and Biases
Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created. Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources. Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts.
Acknowledgements
Base Model: Meta AI's Llama-3.2-1B Dataset: Databricks Dolly 15k Subset for General QA Libraries Used:
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit"
config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
device_map='auto',
return_dict=True
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
tokenizer.pad_token = tokenizer.eos_token
peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
Inference the model
def create_chat_template(question, context):
text = f"""
[Instruction] You are a question-answering agent which answers the question based on the related reviews.
If related reviews are not provided, you can generate the answer based on the question.\n
[Question] {question}\n
[Related Reviews] {context}\n
[Answer]
"""
return text
def generate_response(question, context):
text = create_chat_template(question, context)
inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)
config = GenerationConfig(
max_length=256,
temperature=0.5,
top_k=5,
top_p=0.95,
repetition_penalty=1.2,
do_sample=True,
penalty_alpha=0.6
)
response = model.generate(**inputs, generation_config=config)
output = tokenizer.decode(response[0], skip_special_tokens=True)
return output
# Example usage
question = "Explain the process of photosynthesis."
response = generate_response(question)
print(response)