Model Card for Model ID

This model card outlines the Pebblo Classifier, a machine learning system specialized in text classification. Developed by DAXA.AI, this model is adept at categorizing various agreement documents within organizational structures, trained on 21 distinct labels.

Model Details

Model Description

The Pebblo Classifier is a BERT-based model, fine-tuned from distilbert-base-uncased, targeting RAG (Retrieve-And-Generate) applications. It classifies text into categories such as "BOARD_MEETING_AGREEMENT," "CONSULTING_AGREEMENT," and others, streamlining document classification processes.

Developed by: DAXA.AI
Funded by: Open Source
Model type: Classification model
Language(s) (NLP): English
License: MIT
Finetuned from model: distilbert-base-uncased

Model Sources

Repository: https://huggingface.co/daxa-ai/pebblo-classifier
Demo: https://huggingface.co/spaces/daxa-ai/Daxa-Classifier

Uses

Intended Use

The model is designed for direct application in document classification, capable of immediate deployment without additional fine-tuning.

Recommendations

End-users should be cognizant of potential biases and limitations inherent in the model. For optimal use, understanding these aspects is recommended.

How to Get Started with the Model

Use the code below to get started with the model.

# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import joblib
from huggingface_hub import hf_hub_url, cached_download

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("daxa-ai/pebblo-classifier")
model = AutoModelForSequenceClassification.from_pretrained("daxa-ai/pebblo-classifier")

# Example text
text = "Please enter your text here."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

# Apply softmax to the logits
probabilities = torch.nn.functional.softmax(output.logits, dim=-1)

# Get the predicted label
predicted_label = torch.argmax(probabilities, dim=-1)

# URL of your Hugging Face model repository
REPO_NAME = "daxa-ai/pebblo-classifier"

# Path to the label encoder file in the repository
LABEL_ENCODER_FILE = "label_encoder.joblib"

# Construct the URL to the label encoder file
url = hf_hub_url(REPO_NAME, filename=LABEL_ENCODER_FILE)

# Download and cache the label encoder file
filename = cached_download(url)

# Load the label encoder
label_encoder = joblib.load(filename)

# Decode the predicted label
decoded_label = label_encoder.inverse_transform(predicted_label.numpy())

print(decoded_label)

Training Details

Training Data

The training dataset consists of 141,055 entries, with 21 unique labels. The labels span various document types, with instances distributed across three text sizes (128 ± x, 256 ± x, and 512 ± x words; x varies within 20). Here are the labels along with their respective counts in the dataset:

Agreement Type	Instances
BOARD_MEETING_AGREEMENT	4,206
CONSULTING_AGREEMENT	2,965
CUSTOMER_LIST_AGREEMENT	8,966
DISTRIBUTION_PARTNER_AGREEMENT	5,144
EMPLOYEE_AGREEMENT	3,876
ENTERPRISE_AGREEMENT	4,213
ENTERPRISE_LICENSE_AGREEMENT	8,999
EXECUTIVE_SEVERANCE_AGREEMENT	8,996
FINANCIAL_REPORT_AGREEMENT	11,384
HARMFUL_ADVICE	1,887
INTERNAL_PRODUCT_ROADMAP_AGREEMENT	6,982
LOAN_AND_SECURITY_AGREEMENT	8,957
MEDICAL_ADVICE	3,847
MERGER_AGREEMENT	7,704
NDA_AGREEMENT	5,221
NORMAL_TEXT	8,994
PATENT_APPLICATION_FILLINGS_AGREEMENT	8,802
PRICE_LIST_AGREEMENT	8,906
SETTLEMENT_AGREEMENT	3,737
SEXUAL_CONTENT	8,957
SEXUAL_INCIDENT_REPORT	8,321

Evaluation

Testing Data & Metrics

Testing Data

Evaluation was performed on a dataset of 86,281 entries with a temperature range of 1-1.25 for randomness. Here are the labels along with their respective counts in the dataset:

Agreement Type	Instances
BOARD_MEETING_AGREEMENT	3,975
CONSULTING_AGREEMENT	1,430
CUSTOMER_LIST_AGREEMENT	4,488
DISTRIBUTION_PARTNER_AGREEMENT	6,696
EMPLOYEE_AGREEMENT	1,310
ENTERPRISE_AGREEMENT	1,501
ENTERPRISE_LICENSE_AGREEMENT	7,967
EXECUTIVE_SEVERANCE_AGREEMENT	4,795
FINANCIAL_REPORT_AGREEMENT	4,686
HARMFUL_ADVICE	361
INTERNAL_PRODUCT_ROADMAP_AGREEMENT	3,740
LOAN_AND_SECURITY_AGREEMENT	5,833
MEDICAL_ADVICE	643
MERGER_AGREEMENT	6,557
NDA_AGREEMENT	1,352
NORMAL_TEXT	5,811
PATENT_APPLICATION_FILLINGS_AGREEMENT	5,608
PRICE_LIST_AGREEMENT	5,044
SETTLEMENT_AGREEMENT	5,377
SEXUAL_CONTENT	4,356
SEXUAL_INCIDENT_REPORT	4,750

Metrics

Agreement Type	precision	recall	f1-score	support
BOARD_MEETING_AGREEMENT	0.92	0.95	0.93	3,975
CONSULTING_AGREEMENT	0.81	0.85	0.83	1,430
CUSTOMER_LIST_AGREEMENT	0.90	0.88	0.89	4,488
DISTRIBUTION_PARTNER_AGREEMENT	0.73	0.63	0.68	6,696
EMPLOYEE_AGREEMENT	0.85	0.84	0.85	1,310
ENTERPRISE_AGREEMENT	0.18	0.70	0.29	1,501
ENTERPRISE_LICENSE_AGREEMENT	0.92	0.78	0.84	7,967
EXECUTIVE_SEVERANCE_AGREEMENT	0.97	0.88	0.92	4,795
FINANCIAL_REPORT_AGREEMENT	0.93	0.99	0.96	4,686
HARMFUL_ADVICE	0.92	0.94	0.93	361
INTERNAL_PRODUCT_ROADMAP_AGREEMENT	0.94	0.98	0.96	3,740
LOAN_AND_SECURITY_AGREEMENT	0.93	0.97	0.95	5,833
MEDICAL_ADVICE	0.93	1.00	0.96	643
MERGER_AGREEMENT	0.93	0.45	0.61	6,557
NDA_AGREEMENT	0.68	0.91	0.78	1,352
NORMAL_TEXT	0.95	0.94	0.95	5,811
PATENT_APPLICATION_FILLINGS_AGREEMENT	0.96	0.99	0.98	5,608
PRICE_LIST_AGREEMENT	0.76	0.79	0.77	5,044
SETTLEMENT_AGREEMENT	0.76	0.78	0.77	5,377
SEXUAL_CONTENT	0.92	0.97	0.94	4,356
SEXUAL_INCIDENT_REPORT	0.99	0.94	0.96	4,750
accuracy			0.84	86,280
macro avg	0.85	0.86	0.84	86,280
weighted avg	0.88	0.84	0.85	86,280

Results

The model’s performance is summarized by precision, recall, and f1-score metrics, which are detailed across all 21 labels in the dataset. Based on the test data evaluation results, the model achieved an accuracy of 0.8424, a precision of 0.8794, and a recall of 0.8424. The F1-score, which is the harmonic mean of precision and recall, stands at 0.8505.

The evaluation loss, which measures the discrepancy between the model’s predictions and the actual values, is 0.6815. Lower loss values indicate better model performance.

The model was able to process approximately 97.684 samples per second during the evaluation, which took a total runtime of 883.2545 seconds. The model performed approximately 0.764 evaluation steps per second.