metadata
language:
- multilingual
- de
- en
license: mit
library_name: sklearn
tags:
- sklearn
- skops
- text-classification
- english
- german
datasets:
- philipp-zettl/GGU-xx
model_format: pickle
model_file: GGU-CLF.pkl
get_started_code: |-
```python
import pickle
with open(pkl_filename, 'rb') as file:
clf = pickle.load(file)
```
model_card_authors: https://huggingface.co/philipp-zettl
limitations: This model is ready to be used in production.
model_description: >-
GGU (Greeting/Gratitude/Unknown) classifier for natural language chat
messages.
model_id: GGU-CLF
funded_by: https://huggingface.co/easybits
repo: https://huggingface.co/philipp-zettl/GGU-CLF
widget:
- example_title: 'Greeting (English #1)'
text: Hey there
- example_title: 'Greeting (English #2)'
text: Good to see you
- example_title: Greeting (German)
text: Guten Morgen
- example_title: 'Gratitude (English #1)'
text: Thank you
- example_title: 'Gratitude (English #2)'
text: Cheers mate
Model description
This is a Multinomial Naive Bayes model trained on a custom dataset. Count vectorizer is used for vectorization. It is used to classify user text into the classes:
- 0: Greeting
- 1: Gratitude
- 2: Unknown
Intended uses & limitations
Direct use
Use this model to classify messages from natural laguage chats.
Out Of Scope Usage
The model was not trained on multi-sentence samples. You should avoid those. Officially tested and supported languages are english, german any other language is considered out of scope.
Training Procedure
This model was trained using the philipp-zettl/GGU-xx dataset.
You can find it's performance metrics under Evaluation Results.
Hyperparameters
Click to expand
Hyperparameter | Value |
---|---|
memory | |
steps | [('vect', TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3))), ('clf', MultinomialNB(alpha=0.112))] |
verbose | False |
vect | TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3)) |
clf | MultinomialNB(alpha=0.112) |
vect__analyzer | char_wb |
vect__binary | False |
vect__decode_error | strict |
vect__dtype | <class 'numpy.float64'> |
vect__encoding | utf-8 |
vect__input | content |
vect__lowercase | False |
vect__max_df | 1.0 |
vect__max_features | |
vect__min_df | 1 |
vect__ngram_range | (1, 3) |
vect__norm | l2 |
vect__preprocessor | |
vect__smooth_idf | True |
vect__stop_words | |
vect__strip_accents | |
vect__sublinear_tf | False |
vect__token_pattern | (?u)\b\w\w+\b |
vect__tokenizer | |
vect__use_idf | True |
vect__vocabulary | |
clf__alpha | 0.112 |
clf__class_prior | |
clf__fit_prior | True |
clf__force_alpha | True |
Model Plot
Pipeline(steps=[('vect',TfidfVectorizer(analyzer='char_wb', lowercase=False,ngram_range=(1, 3))),('clf', MultinomialNB(alpha=0.112))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('vect',TfidfVectorizer(analyzer='char_wb', lowercase=False,ngram_range=(1, 3))),('clf', MultinomialNB(alpha=0.112))])
TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3))
MultinomialNB(alpha=0.112)
Evaluation Results
Metric | Value |
---|---|
accuracy | 0.951691 |
f1 score | 0.951691 |
Evaluation Methods
The model is evaluated on validation data from the dataset's test split, using accuracy and F1-score with micro average.
Confusion matrix
Model description/Evaluation Results/Classification Report
Click to expand
index | precision | recall | f1-score | support |
---|---|---|---|---|
greeting | 0.926471 | 0.969231 | 0.947368 | 65 |
gratitude | 0.982456 | 0.888889 | 0.933333 | 63 |
unknown | 0.95122 | 0.987342 | 0.968944 | 79 |
macro avg | 0.953382 | 0.948487 | 0.949882 | 207 |
weighted avg | 0.952955 | 0.951691 | 0.951331 | 207 |
How to Get Started with the Model
import pickle
with open(pkl_filename, 'rb') as file:
clf = pickle.load(file)
Model Card Authors
This model card is written by following authors: