File size: 4,916 Bytes

ae026c2

language:
- en
license: apache-2.0
tags:
- text-classification
- customer-support
- ticket-classification
- distilbert
datasets:
- custom
metrics:
- accuracy
model-index:
- name: ticket-classification-v1
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: Custom Ticket Dataset
      type: custom
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9485
---

# Model Card for Dragneel/ticket-classification-v1

This model fine-tunes the DistilBERT base uncased model to classify customer support tickets into four categories. It achieves **94.85% accuracy** on the evaluation dataset.

## Model Details

### Model Description

This model is designed to automatically categorize customer support tickets based on their content. It can classify tickets into the following categories:

- **Billing Question**: Issues related to billing, payments, subscriptions, etc.
- **Feature Request**: Suggestions for new features or improvements
- **General Inquiry**: General questions about products or services
- **Technical Issue**: Technical problems, bugs, errors, etc.

The model uses DistilBERT as its base architecture - a distilled version of BERT that is smaller, faster, and more efficient while retaining good performance.

- **Developed by:** Dragneel
- **Model type:** Text Classification
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)

## Uses

### Direct Use

This model can be directly used for:
- Automated ticket routing and prioritization
- Customer support workflow optimization
- Analytics on ticket categories
- Real-time ticket classification

### Out-of-Scope Use

This model should not be used for:
- Processing sensitive customer information without proper privacy measures
- Making final decisions without human review for complex or critical issues
- Classifying tickets in languages other than English
- Categorizing content outside the customer support domain

## Bias, Risks, and Limitations

- The model was trained on a specific dataset and may not generalize well to significantly different customer support contexts
- Performance may degrade for very technical or domain-specific tickets not represented in the training data
- Very short or ambiguous tickets might be misclassified

### Recommendations

Users should review classifications for accuracy, especially for tickets that fall on the boundary between categories. Consider retraining the model on domain-specific data if using in a specialized industry.

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="Dragneel/ticket-classification-v1")

# Example tickets
tickets = [
    "I was charged twice for my subscription this month. Can you help?",
    "The app keeps crashing whenever I try to upload a file",
    "Would it be possible to add dark mode to the dashboard?",
    "What are your business hours?"
]

# Classify tickets
for ticket in tickets:
    result = classifier(ticket)
    print(f"Ticket: {ticket}")
    print(f"Category: {result[0]['label']}")
    print(f"Confidence: {result[0]['score']:.4f}")
    print()
```

### ID to Label Mapping

```python
id_to_label = {
    0: 'Billing Question', 
    1: 'Feature Request', 
    2: 'General Inquiry', 
    3: 'Technical Issue'
}
```

## Training Details

### Training Data

The model was trained on a dataset of customer support tickets that include diverse examples across all four categories. Each ticket typically contains a title and description detailing the customer's issue or request.

### Training Procedure

#### Training Hyperparameters

- **Learning rate:** 0.001
- **Batch size:** 2
- **Epochs:** 10 (with early stopping)
- **Weight decay:** 0.01
- **Early stopping patience:** 2 epochs
- **Optimizer:** AdamW
- **Training regime:** fp32

## Evaluation

### Testing Data, Factors & Metrics

#### Metrics

The model is evaluated using the following metrics:
- Accuracy: Percentage of correctly classified tickets
- Loss: Cross-entropy loss on the evaluation dataset

### Results

The model achieved the following metrics on the evaluation dataset:

| Metric | Value |
|--------|-------|
| Accuracy | 94.85% |
| Loss | 0.248 |
| Runtime | 16.01s |
| Samples/second | 23.05 |

## Technical Specifications

### Model Architecture and Objective

The model architecture is based on DistilBERT, a distilled version of BERT. It consists of the base DistilBERT model with a classification head layer on top. The model was fine-tuned using cross-entropy loss to predict the correct category for each ticket.

## Model Card Contact

For inquiries about this model, please open an issue on the model repository.
```