metadata

language:
  - multilingual
  - de
  - en
license: mit
library_name: sklearn
tags:
  - sklearn
  - skops
  - text-classification
  - english
  - german
datasets:
  - philipp-zettl/GGU-xx
model_format: pickle
model_file: GGU-CLF.pkl
get_started_code: |-
  ```python
  import pickle
  with open(pkl_filename, 'rb') as file:
      clf = pickle.load(file)
  ```
model_card_authors: https://huggingface.co/philipp-zettl
limitations: This model is ready to be used in production.
model_description: >-
  GGU (Greeting/Gratitude/Unknown) classifier for natural language chat
  messages.
model_id: GGU-CLF
funded_by: https://huggingface.co/easybits
repo: https://huggingface.co/philipp-zettl/GGU-CLF
widget:
  - example_title: 'Greeting (English #1)'
    text: Hey there
  - example_title: 'Greeting (English #2)'
    text: Good to see you
  - example_title: Greeting (German)
    text: Guten Morgen
  - example_title: 'Gratitude (English #1)'
    text: Thank you
  - example_title: 'Gratitude (English #2)'
    text: Cheers mate

Model description

This is a Multinomial Naive Bayes model trained on a custom dataset. Count vectorizer is used for vectorization. It is used to classify user text into the classes:

0: Greeting
1: Gratitude
2: Unknown

Intended uses & limitations

Direct use

Use this model to classify messages from natural laguage chats.

Out Of Scope Usage

The model was not trained on multi-sentence samples. You should avoid those. Officially tested and supported languages are english, german any other language is considered out of scope.

Training Procedure

This model was trained using the philipp-zettl/GGU-xx dataset.

You can find it's performance metrics under Evaluation Results.

Hyperparameters

Click to expand

Hyperparameter	Value
memory
steps	[('vect', TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3))), ('clf', MultinomialNB(alpha=0.112))]
verbose	False
vect	TfidfVectorizer(analyzer='char_wb', lowercase=False, ngram_range=(1, 3))
clf	MultinomialNB(alpha=0.112)
vect__analyzer	char_wb
vect__binary	False
vect__decode_error	strict
vect__dtype	<class 'numpy.float64'>
vect__encoding	utf-8
vect__input	content
vect__lowercase	False
vect__max_df	1.0
vect__max_features
vect__min_df	1
vect__ngram_range	(1, 3)
vect__norm	l2
vect__preprocessor
vect__smooth_idf	True
vect__stop_words
vect__strip_accents
vect__sublinear_tf	False
vect__token_pattern	(?u)\b\w\w+\b
vect__tokenizer
vect__use_idf	True
vect__vocabulary
clf__alpha	0.112
clf__class_prior
clf__fit_prior	True
clf__force_alpha	True

Model Plot

Pipeline(steps=[('vect',TfidfVectorizer(analyzer='char_wb', lowercase=False,ngram_range=(1, 3))),('clf', MultinomialNB(alpha=0.112))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric	Value
accuracy	0.951691
f1 score	0.951691

Evaluation Methods

The model is evaluated on validation data from the dataset's test split, using accuracy and F1-score with micro average.

Confusion matrix

Model description/Evaluation Results/Classification Report

Click to expand

index	precision	recall	f1-score	support
greeting	0.926471	0.969231	0.947368	65
gratitude	0.982456	0.888889	0.933333	63
unknown	0.95122	0.987342	0.968944	79
macro avg	0.953382	0.948487	0.949882	207
weighted avg	0.952955	0.951691	0.951331	207

How to Get Started with the Model

import pickle
with open(pkl_filename, 'rb') as file:
    clf = pickle.load(file)

Model Card Authors

This model card is written by following authors:

philipp-zettl