Model Card for vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF

This model is a fine-tuned version of Llama-2-Chat-7b on company-specific question-answers data. It is designed for efficient performance while maintaining high-quality output, suitable for conversational AI applications.

Full Tutorial on Cheap Finetuning

https://github.com/VishanOberoi/FineTuningForTheGPUPoor?tab=readme-ov-file

Model Details

It was finetuned using QLORA and PEFT. After fine-tuning, the adapters were merged with the base model and then quantized to GGUF.

Developed by: Vishan Oberoi and Dev Chandan.
Model type: Transformer-based Large Language Model
Language(s) (NLP): English
License: MIT
Finetuned from model: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

Model Sources

Repository: vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF
Links:
- LLaMA: LLaMA Paper
- QLORA: QLORA Paper
- llama.cpp: llama.cpp Paper/Documentation

Uses

This model is optimized for direct use in conversational AI, particularly for generating responses based on company-specific data. It can be utilized effectively in customer service bots, FAQ bots, and other applications where accurate and contextually relevant answers are required.

Example with `ctransformers`:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

llm = AutoModelForCausalLM.from_pretrained("vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF", model_file="finetuned.gguf", model_type="llama", gpu_layers = 50, max_new_tokens = 2000, temperature = 0.2, top_k = 40, top_p = 0.6, context_length = 6000)

system_prompt = "<<SYS>>You are a useful bot... <</SYS>>"

user_prompt = "Tell me about your company"

Combine system prompt with user prompt

full_prompt = f"{system_prompt}\n[INST]{user_prompt}[/INST]"

Generate the response

response = llm(full_prompt)

Print the response

print(response)