Dragneel's picture
Create README.md
ae026c2 verified
language:
- en
license: apache-2.0
tags:
- text-classification
- customer-support
- ticket-classification
- distilbert
datasets:
- custom
metrics:
- accuracy
model-index:
- name: ticket-classification-v1
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: Custom Ticket Dataset
type: custom
metrics:
- name: Accuracy
type: accuracy
value: 0.9485
---
# Model Card for Dragneel/ticket-classification-v1
This model fine-tunes the DistilBERT base uncased model to classify customer support tickets into four categories. It achieves **94.85% accuracy** on the evaluation dataset.
## Model Details
### Model Description
This model is designed to automatically categorize customer support tickets based on their content. It can classify tickets into the following categories:
- **Billing Question**: Issues related to billing, payments, subscriptions, etc.
- **Feature Request**: Suggestions for new features or improvements
- **General Inquiry**: General questions about products or services
- **Technical Issue**: Technical problems, bugs, errors, etc.
The model uses DistilBERT as its base architecture - a distilled version of BERT that is smaller, faster, and more efficient while retaining good performance.
- **Developed by:** Dragneel
- **Model type:** Text Classification
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)
## Uses
### Direct Use
This model can be directly used for:
- Automated ticket routing and prioritization
- Customer support workflow optimization
- Analytics on ticket categories
- Real-time ticket classification
### Out-of-Scope Use
This model should not be used for:
- Processing sensitive customer information without proper privacy measures
- Making final decisions without human review for complex or critical issues
- Classifying tickets in languages other than English
- Categorizing content outside the customer support domain
## Bias, Risks, and Limitations
- The model was trained on a specific dataset and may not generalize well to significantly different customer support contexts
- Performance may degrade for very technical or domain-specific tickets not represented in the training data
- Very short or ambiguous tickets might be misclassified
### Recommendations
Users should review classifications for accuracy, especially for tickets that fall on the boundary between categories. Consider retraining the model on domain-specific data if using in a specialized industry.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification", model="Dragneel/ticket-classification-v1")
# Example tickets
tickets = [
"I was charged twice for my subscription this month. Can you help?",
"The app keeps crashing whenever I try to upload a file",
"Would it be possible to add dark mode to the dashboard?",
"What are your business hours?"
]
# Classify tickets
for ticket in tickets:
result = classifier(ticket)
print(f"Ticket: {ticket}")
print(f"Category: {result[0]['label']}")
print(f"Confidence: {result[0]['score']:.4f}")
print()
```
### ID to Label Mapping
```python
id_to_label = {
0: 'Billing Question',
1: 'Feature Request',
2: 'General Inquiry',
3: 'Technical Issue'
}
```
## Training Details
### Training Data
The model was trained on a dataset of customer support tickets that include diverse examples across all four categories. Each ticket typically contains a title and description detailing the customer's issue or request.
### Training Procedure
#### Training Hyperparameters
- **Learning rate:** 0.001
- **Batch size:** 2
- **Epochs:** 10 (with early stopping)
- **Weight decay:** 0.01
- **Early stopping patience:** 2 epochs
- **Optimizer:** AdamW
- **Training regime:** fp32
## Evaluation
### Testing Data, Factors & Metrics
#### Metrics
The model is evaluated using the following metrics:
- Accuracy: Percentage of correctly classified tickets
- Loss: Cross-entropy loss on the evaluation dataset
### Results
The model achieved the following metrics on the evaluation dataset:
| Metric | Value |
|--------|-------|
| Accuracy | 94.85% |
| Loss | 0.248 |
| Runtime | 16.01s |
| Samples/second | 23.05 |
## Technical Specifications
### Model Architecture and Objective
The model architecture is based on DistilBERT, a distilled version of BERT. It consists of the base DistilBERT model with a classification head layer on top. The model was fine-tuned using cross-entropy loss to predict the correct category for each ticket.
## Model Card Contact
For inquiries about this model, please open an issue on the model repository.
```