cgrodrigues's picture
Update README.md
7b35981 verified
metadata
library_name: transformers
tags:
  - trl
  - sft

Model Card for phi-2-function-calling

Model Overview

Summary of the Model

The primary purpose of this fine-tuned model is Function Calling. It is a fine-tuned version of microsoft/phi-2 specifically adapted to handle function-calling tasks efficiently. The model can generate structured text, making it particularly suited for scenarios requiring automated function invocation based on textual instructions.

Model Type

Model Details

Model Description

  • Developed by: Microsoft and Fine-tuned by Carlos Rodrigues (at DataKensei)
  • Model Type: Text Generation, trained for Function Calling tasks.
  • Language(s): English
  • License: MIT License
  • Finetuned from model: microsoft/phi-2

Model Sources

Uses

Direct Use

The model is directly usable for generating function calls based on user prompts. This includes structured tasks like scheduling meetings, calculating savings, or any scenario where a text input should translate into an actionable function.

Downstream Use

While the model is primarily designed for function calling, it can be fine-tuned further or integrated into larger systems where similar structured text generation is required. For example, it could be part of a larger chatbot system that automates task handling.

Out-of-Scope Use

The model is not designed for tasks unrelated to structured text generation or function calling. Misuse might include attempts to use it for general-purpose language modeling or content generation beyond its specialized training focus.

Bias, Risks, and Limitations

Biases

The model may inherit biases from the base model (microsoft/phi-2), particularly those related to the English language and specific function-calling tasks. Users should be aware of potential biases in task framing and language interpretation.

Limitations

  • Task-Specific: The model is specialized for function-calling tasks and might not perform well on other types of text generation tasks.
  • English Only: The model is limited to English, and performance in other languages is not guaranteed.

Recommendations

Users should test the model in their specific environment to ensure it performs as expected for the desired use case. Awareness of the model's biases and limitations is crucial when deploying it in critical systems.

How to Get Started with the Model

You can use the following code snippet to get started with the model:

from transformers import pipeline

# Load the model and tokenizer
pipe = pipeline(task="text-generation", model="DataKensei/phi-2-function-calling")

# Example prompt
prompt = '''
<|im_start|system
You are a helpful assistant with access to the following functions. Use these functions when they are relevant to assist with a user's request
[
    {
        "name": "calculate_retirement_savings",
        "description": "Project the savings at retirement based on current contributions.",
        "parameters": {
            "type": "object",
            "properties": {
                "current_age": {
                    "type": "integer",
                    "description": "The current age of the individual."
                },
                "retirement_age": {
                    "type": "integer",
                    "description": "The desired retirement age."
                },
                "current_savings": {
                    "type": "number",
                    "description": "The current amount of savings."
                },
                "monthly_contribution": {
                    "type": "number",
                    "description": "The monthly contribution towards retirement savings."
                }
            },
            "required": ["current_age", "retirement_age", "current_savings", "monthly_contribution"]
        }
    }
]
<|im_start|user
I am currently 40 years old and plan to retire at 65. I have no savings at the moment, but I intend to save $500 every month. Could you project the savings at retirement based on current contributions?
'''

result = pipe(prompt)
print(result[0]['generated_text'])

Training Details

Training Data

The model was fine-tuned using a syntectic dataset of function-calling prompts and responses. The data was curated to cover a wide range of potential function calls, ensuring the model's applicability to various structured text generation tasks.

The script to generate the data can be found in this repository.

Training Procedure

  • Training regime: The model was fine-tuned using 4-bit precision with bnb_4bit quantization on NVIDIA GPUs.
  • Optimizer: PagedAdamW (32-bit)
  • Learning Rate: 2e-4
  • Batch Size: 2 (with gradient accumulation steps = 1)
  • Epochs: 1

Preprocessing

The training and evaluation data was generated using this repository.

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated using a separate test set, comprising 10% of the original dataset, containing various function-calling scenarios.

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Environmental Impact

Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO$_2$eq/kWh. A cumulative of 10 hours of computation was performed on hardware of type GTX 1080 (TDP of 180W).

Total emissions are estimated to be 0.78 kgCO of which 0 percents were directly offset.

Estimations were conducted using the MachineLearning Impact calculator presented in presented in Lacoste et al. (2019)

@article{lacoste2019quantifying,
  title={Quantifying the Carbon Emissions of Machine Learning},
  author={Lacoste, Alexandre and Luccioni, Alexandra and Schmidt, Victor and Dandres, Thomas},
  journal={arXiv preprint arXiv:1910.09700},
  year={2019}
}
  • Hardware Type: NVIDIA GPUs (GTX 1080)
  • Hours used: 10
  • Cloud Provider: Private Infrastructure
  • Carbon Emitted: 0.78 kgCO

Technical Specifications

Model Architecture and Objective

The model is based on the "microsoft/phi-2" architecture, fine-tuned specifically for function-calling tasks. The objective was to optimize the model's ability to generate structured text suitable for automated function execution.

Compute Infrastructure

[More Information Needed]

Hardware

The model was trained on NVIDIA GPUs.

Software

The training used PyTorch and the Hugging Face Transformers library, with additional support from the PEFT library for fine-tuning.

Citation

BibTeX:

@misc{phi2functioncalling,
  title={phi-2-function-calling},
  author={Carlos Rodrigues},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/DataKensei/phi-2-function-calling}},
}

Model Card Contact

For more information, please contact Carlos Rodrigues at DataKensei.