Create README.md

ae026c2 verified 11 days ago

4.92 kB

	language:
	- en
	license: apache-2.0
	tags:
	- text-classification
	- customer-support
	- ticket-classification
	- distilbert
	datasets:
	- custom
	metrics:
	- accuracy
	model-index:
	- name: ticket-classification-v1
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	name: Custom Ticket Dataset
	type: custom
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9485
	---

	# Model Card for Dragneel/ticket-classification-v1

	This model fine-tunes the DistilBERT base uncased model to classify customer support tickets into four categories. It achieves 94.85% accuracy on the evaluation dataset.

	## Model Details

	### Model Description

	This model is designed to automatically categorize customer support tickets based on their content. It can classify tickets into the following categories:

	- Billing Question: Issues related to billing, payments, subscriptions, etc.
	- Feature Request: Suggestions for new features or improvements
	- General Inquiry: General questions about products or services
	- Technical Issue: Technical problems, bugs, errors, etc.

	The model uses DistilBERT as its base architecture - a distilled version of BERT that is smaller, faster, and more efficient while retaining good performance.

	- Developed by: Dragneel
	- Model type: Text Classification
	- Language(s): English
	- License: Apache 2.0
	- Finetuned from model: [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)

	## Uses

	### Direct Use

	This model can be directly used for:
	- Automated ticket routing and prioritization
	- Customer support workflow optimization
	- Analytics on ticket categories
	- Real-time ticket classification

	### Out-of-Scope Use

	This model should not be used for:
	- Processing sensitive customer information without proper privacy measures
	- Making final decisions without human review for complex or critical issues
	- Classifying tickets in languages other than English
	- Categorizing content outside the customer support domain

	## Bias, Risks, and Limitations

	- The model was trained on a specific dataset and may not generalize well to significantly different customer support contexts
	- Performance may degrade for very technical or domain-specific tickets not represented in the training data
	- Very short or ambiguous tickets might be misclassified

	### Recommendations

	Users should review classifications for accuracy, especially for tickets that fall on the boundary between categories. Consider retraining the model on domain-specific data if using in a specialized industry.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import pipeline

	# Load the model
	classifier = pipeline("text-classification", model="Dragneel/ticket-classification-v1")

	# Example tickets
	tickets = [
	"I was charged twice for my subscription this month. Can you help?",
	"The app keeps crashing whenever I try to upload a file",
	"Would it be possible to add dark mode to the dashboard?",
	"What are your business hours?"
	]

	# Classify tickets
	for ticket in tickets:
	result = classifier(ticket)
	print(f"Ticket: {ticket}")
	print(f"Category: {result[0]['label']}")
	print(f"Confidence: {result[0]['score']:.4f}")
	print()
	```

	### ID to Label Mapping

	```python
	id_to_label = {
	0: 'Billing Question',
	1: 'Feature Request',
	2: 'General Inquiry',
	3: 'Technical Issue'
	}
	```

	## Training Details

	### Training Data

	The model was trained on a dataset of customer support tickets that include diverse examples across all four categories. Each ticket typically contains a title and description detailing the customer's issue or request.

	### Training Procedure

	#### Training Hyperparameters

	- Learning rate: 0.001
	- Batch size: 2
	- Epochs: 10 (with early stopping)
	- Weight decay: 0.01
	- Early stopping patience: 2 epochs
	- Optimizer: AdamW
	- Training regime: fp32

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Metrics

	The model is evaluated using the following metrics:
	- Accuracy: Percentage of correctly classified tickets
	- Loss: Cross-entropy loss on the evaluation dataset

	### Results

	The model achieved the following metrics on the evaluation dataset:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Accuracy \| 94.85% \|
	\| Loss \| 0.248 \|
	\| Runtime \| 16.01s \|
	\| Samples/second \| 23.05 \|

	## Technical Specifications

	### Model Architecture and Objective

	The model architecture is based on DistilBERT, a distilled version of BERT. It consists of the base DistilBERT model with a classification head layer on top. The model was fine-tuned using cross-entropy loss to predict the correct category for each ticket.

	## Model Card Contact

	For inquiries about this model, please open an issue on the model repository.
	```