Dragneel's picture
Update readme.md
011b730 verified
# Ticket Classification Model

This model fine-tunes the [DistilBERT base uncased model](https://huggingface.co/distilbert/distilbert-base-uncased) to classify customer support tickets into four categories. It achieves **94.85% accuracy** on the evaluation dataset.

## Model description

This model is designed to automatically categorize customer support tickets based on their content. It can classify tickets into the following categories:

- **Billing Question**: Issues related to billing, payments, subscriptions, etc.
- **Feature Request**: Suggestions for new features or improvements
- **General Inquiry**: General questions about products or services
- **Technical Issue**: Technical problems, bugs, errors, etc.

{0: 'Billing Question', 1: 'Feature Request', 2: 'General Inquiry', 3: 'Technical Issue'}

The model uses DistilBERT as its base architecture - a distilled version of BERT that is smaller, faster, and more efficient while retaining good performance.

## Intended uses & limitations

### Intended uses

- Automated ticket routing and prioritization
- Customer support workflow optimization
- Analytics on ticket categories
- Real-time ticket classification

### Limitations

- The model was trained on a specific dataset and may not generalize well to significantly different customer support contexts
- Performance may degrade for very technical or domain-specific tickets not represented in the training data
- Very short or ambiguous tickets might be misclassified

## Training data

The model was trained on a dataset of customer support tickets that include diverse examples across all four categories. Each ticket typically contains a title and description detailing the customer's issue or request.

## Training procedure

### Model architecture

The model is based on `distilbert-base-uncased` with a classification head on top. It was fine-tuned using the Hugging Face Transformers library.

### Training hyperparameters

- **Learning rate**: 0.001
- **Batch size**: 2
- **Epochs**: 10 (with early stopping)
- **Weight decay**: 0.01
- **Early stopping patience**: 2 epochs
- **Optimizer**: AdamW

## Evaluation results

The model achieved the following metrics on the evaluation dataset:

| Metric | Value |
|--------|-------|
| Accuracy | 94.85% |
| Loss | 0.248 |
| Runtime | 16.01s |
| Samples/second | 23.05 |

## Usage

```python
from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="Dragneel/ticket-classification-v1")

# Example tickets
tickets = [
    "I was charged twice for my subscription this month. Can you help?",
    "The app keeps crashing whenever I try to upload a file",
    "Would it be possible to add dark mode to the dashboard?",
    "What are your business hours?"
]

# Classify tickets
for ticket in tickets:
    result = classifier(ticket)
    print(f"Ticket: {ticket}")
    print(f"Category: {result[0]['label']}")
    print(f"Confidence: {result[0]['score']:.4f}")
    print()

ID to Label Mapping

id_to_label = {
    0: 'Billing Question', 
    1: 'Feature Request', 
    2: 'General Inquiry', 
    3: 'Technical Issue'
}

Model inputs and outputs

Inputs

The model takes text inputs representing customer support tickets. These can be in the form of titles, descriptions, or both combined.

Outputs

The model outputs one of four categories:

  • Billing Question
  • Feature Request
  • General Inquiry
  • Technical Issue

Each prediction includes a confidence score.

Debugging and Logging

The model includes comprehensive logging that can be enabled to troubleshoot classification issues. This is particularly useful when integrating the model into production environments.

Citation

If you use this model in your research or application, please cite:

@misc{Dragneel2023ticket-classification,
  author = {Dragneel},
  title = {Ticket Classification Model},
  year = {2023},
  publisher = {GitHub},
  url = {https://huggingface.co/Dragneel/ticket-classification-v1}
}