Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,165 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
language:
|
2 |
+
- en
|
3 |
+
license: apache-2.0
|
4 |
+
tags:
|
5 |
+
- text-classification
|
6 |
+
- customer-support
|
7 |
+
- ticket-classification
|
8 |
+
- distilbert
|
9 |
+
datasets:
|
10 |
+
- custom
|
11 |
+
metrics:
|
12 |
+
- accuracy
|
13 |
+
model-index:
|
14 |
+
- name: ticket-classification-v1
|
15 |
+
results:
|
16 |
+
- task:
|
17 |
+
type: text-classification
|
18 |
+
name: Text Classification
|
19 |
+
dataset:
|
20 |
+
name: Custom Ticket Dataset
|
21 |
+
type: custom
|
22 |
+
metrics:
|
23 |
+
- name: Accuracy
|
24 |
+
type: accuracy
|
25 |
+
value: 0.9485
|
26 |
+
---
|
27 |
+
|
28 |
+
# Model Card for Dragneel/ticket-classification-v1
|
29 |
+
|
30 |
+
This model fine-tunes the DistilBERT base uncased model to classify customer support tickets into four categories. It achieves **94.85% accuracy** on the evaluation dataset.
|
31 |
+
|
32 |
+
## Model Details
|
33 |
+
|
34 |
+
### Model Description
|
35 |
+
|
36 |
+
This model is designed to automatically categorize customer support tickets based on their content. It can classify tickets into the following categories:
|
37 |
+
|
38 |
+
- **Billing Question**: Issues related to billing, payments, subscriptions, etc.
|
39 |
+
- **Feature Request**: Suggestions for new features or improvements
|
40 |
+
- **General Inquiry**: General questions about products or services
|
41 |
+
- **Technical Issue**: Technical problems, bugs, errors, etc.
|
42 |
+
|
43 |
+
The model uses DistilBERT as its base architecture - a distilled version of BERT that is smaller, faster, and more efficient while retaining good performance.
|
44 |
+
|
45 |
+
- **Developed by:** Dragneel
|
46 |
+
- **Model type:** Text Classification
|
47 |
+
- **Language(s):** English
|
48 |
+
- **License:** Apache 2.0
|
49 |
+
- **Finetuned from model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)
|
50 |
+
|
51 |
+
## Uses
|
52 |
+
|
53 |
+
### Direct Use
|
54 |
+
|
55 |
+
This model can be directly used for:
|
56 |
+
- Automated ticket routing and prioritization
|
57 |
+
- Customer support workflow optimization
|
58 |
+
- Analytics on ticket categories
|
59 |
+
- Real-time ticket classification
|
60 |
+
|
61 |
+
### Out-of-Scope Use
|
62 |
+
|
63 |
+
This model should not be used for:
|
64 |
+
- Processing sensitive customer information without proper privacy measures
|
65 |
+
- Making final decisions without human review for complex or critical issues
|
66 |
+
- Classifying tickets in languages other than English
|
67 |
+
- Categorizing content outside the customer support domain
|
68 |
+
|
69 |
+
## Bias, Risks, and Limitations
|
70 |
+
|
71 |
+
- The model was trained on a specific dataset and may not generalize well to significantly different customer support contexts
|
72 |
+
- Performance may degrade for very technical or domain-specific tickets not represented in the training data
|
73 |
+
- Very short or ambiguous tickets might be misclassified
|
74 |
+
|
75 |
+
### Recommendations
|
76 |
+
|
77 |
+
Users should review classifications for accuracy, especially for tickets that fall on the boundary between categories. Consider retraining the model on domain-specific data if using in a specialized industry.
|
78 |
+
|
79 |
+
## How to Get Started with the Model
|
80 |
+
|
81 |
+
Use the code below to get started with the model.
|
82 |
+
|
83 |
+
```python
|
84 |
+
from transformers import pipeline
|
85 |
+
|
86 |
+
# Load the model
|
87 |
+
classifier = pipeline("text-classification", model="Dragneel/ticket-classification-v1")
|
88 |
+
|
89 |
+
# Example tickets
|
90 |
+
tickets = [
|
91 |
+
"I was charged twice for my subscription this month. Can you help?",
|
92 |
+
"The app keeps crashing whenever I try to upload a file",
|
93 |
+
"Would it be possible to add dark mode to the dashboard?",
|
94 |
+
"What are your business hours?"
|
95 |
+
]
|
96 |
+
|
97 |
+
# Classify tickets
|
98 |
+
for ticket in tickets:
|
99 |
+
result = classifier(ticket)
|
100 |
+
print(f"Ticket: {ticket}")
|
101 |
+
print(f"Category: {result[0]['label']}")
|
102 |
+
print(f"Confidence: {result[0]['score']:.4f}")
|
103 |
+
print()
|
104 |
+
```
|
105 |
+
|
106 |
+
### ID to Label Mapping
|
107 |
+
|
108 |
+
```python
|
109 |
+
id_to_label = {
|
110 |
+
0: 'Billing Question',
|
111 |
+
1: 'Feature Request',
|
112 |
+
2: 'General Inquiry',
|
113 |
+
3: 'Technical Issue'
|
114 |
+
}
|
115 |
+
```
|
116 |
+
|
117 |
+
## Training Details
|
118 |
+
|
119 |
+
### Training Data
|
120 |
+
|
121 |
+
The model was trained on a dataset of customer support tickets that include diverse examples across all four categories. Each ticket typically contains a title and description detailing the customer's issue or request.
|
122 |
+
|
123 |
+
### Training Procedure
|
124 |
+
|
125 |
+
#### Training Hyperparameters
|
126 |
+
|
127 |
+
- **Learning rate:** 0.001
|
128 |
+
- **Batch size:** 2
|
129 |
+
- **Epochs:** 10 (with early stopping)
|
130 |
+
- **Weight decay:** 0.01
|
131 |
+
- **Early stopping patience:** 2 epochs
|
132 |
+
- **Optimizer:** AdamW
|
133 |
+
- **Training regime:** fp32
|
134 |
+
|
135 |
+
## Evaluation
|
136 |
+
|
137 |
+
### Testing Data, Factors & Metrics
|
138 |
+
|
139 |
+
#### Metrics
|
140 |
+
|
141 |
+
The model is evaluated using the following metrics:
|
142 |
+
- Accuracy: Percentage of correctly classified tickets
|
143 |
+
- Loss: Cross-entropy loss on the evaluation dataset
|
144 |
+
|
145 |
+
### Results
|
146 |
+
|
147 |
+
The model achieved the following metrics on the evaluation dataset:
|
148 |
+
|
149 |
+
| Metric | Value |
|
150 |
+
|--------|-------|
|
151 |
+
| Accuracy | 94.85% |
|
152 |
+
| Loss | 0.248 |
|
153 |
+
| Runtime | 16.01s |
|
154 |
+
| Samples/second | 23.05 |
|
155 |
+
|
156 |
+
## Technical Specifications
|
157 |
+
|
158 |
+
### Model Architecture and Objective
|
159 |
+
|
160 |
+
The model architecture is based on DistilBERT, a distilled version of BERT. It consists of the base DistilBERT model with a classification head layer on top. The model was fine-tuned using cross-entropy loss to predict the correct category for each ticket.
|
161 |
+
|
162 |
+
## Model Card Contact
|
163 |
+
|
164 |
+
For inquiries about this model, please open an issue on the model repository.
|
165 |
+
```
|