--- language: - multilingual - de - en license: mit library_name: sklearn tags: - sklearn - skops - text-classification - english - german datasets: - philipp-zettl/GGU-xx model_format: pickle model_file: GGU-CLF.pkl get_started_code: "```python\nimport pickle\nwith open(pkl_filename, 'rb') as file:\n\ \ clf = pickle.load(file)\n```" model_card_authors: https://huggingface.co/philipp-zettl limitations: This model is ready to be used in production. model_description: GGU (Greeting/Gratitude/Unknown) classifier for natural language chat messages. model_id: GGU-CLF funded_by: https://huggingface.co/easybits repo: https://huggingface.co/philipp-zettl/GGU-CLF training_data: https://huggingface.co/datasets/philipp-zettl/GGU-xx widget: - example_title: 'Greeting (English #1)' text: Hey there - example_title: 'Greeting (English #2)' text: Good to see you - example_title: Greeting (German) text: Guten Morgen - example_title: 'Gratitude (English #1)' text: Thank you - example_title: 'Gratitude (English #2)' text: Cheers mate --- # Model description This is a Multinomial Naive Bayes model trained on a custom dataset. Count vectorizer is used for vectorization. It is used to classify user text into the classes: - 0: Greeting - 1: Gratitude - 2: Unknown ## Intended uses & limitations ### Direct use Use this model to classify messages from natural laguage chats. ### Out Of Scope Usage The model was not trained on multi-sentence samples. You should avoid those. Officially tested and supported languages are **english, german** any other language is considered out of scope. ## Training Procedure This model was trained using the [philipp-zettl/GGU-xx](https://huggingface.co/datasets/philipp-zettl/GGU-xx) dataset. You can find it's performance metrics under [Evaluation Results](#evaluation-results). ### Hyperparameters
Click to expand | Hyperparameter | Value | |---------------------|---------------------------------------------------------------------------------------------------------| | memory | | | steps | [('vect', TfidfVectorizer(analyzer='char_wb', ngram_range=(2, 3))), ('clf', MultinomialNB(alpha=0.01))] | | verbose | False | | vect | TfidfVectorizer(analyzer='char_wb', ngram_range=(2, 3)) | | clf | MultinomialNB(alpha=0.01) | | vect__analyzer | char_wb | | vect__binary | False | | vect__decode_error | strict | | vect__dtype | | | vect__encoding | utf-8 | | vect__input | content | | vect__lowercase | True | | vect__max_df | 1.0 | | vect__max_features | | | vect__min_df | 1 | | vect__ngram_range | (2, 3) | | vect__norm | l2 | | vect__preprocessor | | | vect__smooth_idf | True | | vect__stop_words | | | vect__strip_accents | | | vect__sublinear_tf | False | | vect__token_pattern | (?u)\b\w\w+\b | | vect__tokenizer | | | vect__use_idf | True | | vect__vocabulary | | | clf__alpha | 0.01 | | clf__class_prior | | | clf__fit_prior | True | | clf__force_alpha | True |
### Model Plot
Pipeline(steps=[('vect',TfidfVectorizer(analyzer='char_wb', ngram_range=(2, 3))),('clf', MultinomialNB(alpha=0.01))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
## Evaluation Results | Metric | Value | |----------|----------| | accuracy | 0.987055 | | f1 score | 0.987055 | ### Evaluation Methods The model is evaluated on validation data from the dataset's test split, using accuracy and F1-score with micro average. #### Confusion matrix ![Confusion matrix](confusion_matrix.png) # How to Get Started with the Model ```python import pickle with open(pkl_filename, 'rb') as file: clf = pickle.load(file) ``` # Model Card Authors This model card is written by following authors: [philipp-zettl](https://huggingface.co/philipp-zettl/) # Classification Report
Click to expand | index | precision | recall | f1-score | support | |--------------|-------------|----------|------------|-----------| | greeting | 0.978261 | 0.978261 | 0.978261 | 92 | | gratitude | 1 | 0.977011 | 0.988372 | 87 | | unknown | 0.984848 | 1 | 0.992366 | 130 | | macro avg | 0.987703 | 0.985091 | 0.986333 | 309 | | weighted avg | 0.987153 | 0.987055 | 0.987042 | 309 |
# Training Procedure This model was trained using the [philipp-zettl/GGU-xx](https://huggingface.co/datasets/philipp-zettl/GGU-xx) dataset. You can find it's performance metrics under [Evaluation Results](#evaluation-results).