LLaMA-2 70B AQLM 2-bit QLoRA with function calling

This model is fine-tuned from BlackSamorez/Llama-2-70b-AQLM-2Bit-1x16-hf using LLaMA Factory.

The maximum GPU usage during training is 24GB, and the model has preliminary conversation and tool-using abilities.

It requires at least 20GB GRAM at inference.

examples

Training and evaluation data

This model is fine-tuned using 2,000 examples of the Alpaca-GPT4 and Glaive-function-calling-v2 datasets.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("hiyouga/Llama-2-70b-AQLM-2Bit-QLoRA-function-calling")
model = AutoModelForCausalLM.from_pretrained("BlackSamorez/Llama-2-70b-AQLM-2Bit-1x16-hf", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "hiyouga/Llama-2-70b-AQLM-2Bit-QLoRA-function-calling")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
    {
      "role": "system",
      "content": (
        "You have access to the following tools:\n"
        "> Tool Name: get_current_weather\nTool Description: Get the current weather in a given location\n"
        "Tool Args:\n"
        "  - location (string, required): The city and state, e.g. San Francisco, CA\n"
        "  - unit (string): should be one of [\"celsius\", \"fahrenheit\"]\n\n"
        "Use the following format if using a tool:\n"
        "```\n"
        "Action: tool name (one of [get_current_weather]).\n"
        "Action Input: the input to the tool, in a JSON format representing the kwargs "
        "(e.g. ```{\"input\": \"hello world\", \"num_beams\": 5}```).\n"
        "```\n"
      )
    },
    {"role": "user", "content": "What is the weather like in Boston?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to("cuda")
generate_ids = model.generate(inputs, streamer=streamer)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0
  • mixed_precision_training: Native AMP

Training results

loss

Benchmark results

MMLU Benchmark Bits Metric Accurary
Average 2 5-shot, top-1 62.38
STEM 2 5-shot, top-1 51.57
Social Sciences 2 5-shot, top-1 73.44
Humanities 2 5-shot, top-1 57.82
Other 2 5-shot, top-1 68.56

Framework versions

  • PEFT 0.9.0
  • Transformers 4.39.0.dev0
  • Pytorch 2.2.1+cu121
  • Datasets 2.15.0
  • Tokenizers 0.15.2
Downloads last month
30
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for hiyouga/Llama-2-70b-AQLM-2Bit-QLoRA-function-calling

Adapter
(1)
this model

Datasets used to train hiyouga/Llama-2-70b-AQLM-2Bit-QLoRA-function-calling

Spaces using hiyouga/Llama-2-70b-AQLM-2Bit-QLoRA-function-calling 3