sentiment_analysis_model_v2

This model is a fine-tuned version of distilbert/distilbert-base-uncased model and second iteration over tashrifmahmud/sentiment_analysis_model on IMDB and "Rotten Tomatoes" dataset. It achieves the following results on the evaluation set (model has been updated at epoch 1):

Loss: 0.3682
Accuracy: 0.8396
Precision: 0.8267
Recall: 0.8593
F1: 0.8427

Model description

This model is a fine-tuned version of the DistilBERT transformer architecture for sentiment analysis. It was trained on the IMDB dataset for binary classification, distinguishing between positive and negative sentiment in movie reviews. The model has been further fine-tuned on the Rotten Tomatoes dataset to improve its generalization and performance on movie-related text.

Architecture: DistilBERT (a distilled version of BERT for faster inference).
Task: Sentiment Analysis (binary classification: positive or negative sentiment).
Pre-training: The model was pre-trained on a large corpus (BERT's original training).
Fine-tuning: Fine-tuned using both IMDB and Rotten Tomatoes datasets.

Intended uses & limitations

Intended uses:

This model is suitable for classifying the sentiment of text, particularly movie reviews. It can be used in various applications such as: Sentiment analysis for social media posts, customer reviews, or product feedback. Analyzing movie reviews, comments, or related textual data. As part of a sentiment-aware recommendation system, content moderation tool, or market research.

Limitations:

The model is specifically tuned for movie-related sentiment analysis. Its performance on non-movie-related text (e.g., general product reviews, news articles) may not be optimal. The model may not perform well on texts with highly domain-specific terminology outside of movie-related contexts. This model may struggle with sarcasm, irony, and nuanced expressions of sentiment, as is typical with many sentiment analysis models.

Training and evaluation data

Training data:

IMDB dataset: The model was initially trained on the IMDB movie reviews dataset, which consists of 25,000 reviews labeled as positive or negative.
Rotten Tomatoes dataset: To improve the model's performance and generalization, it was further fine-tuned using the Rotten Tomatoes dataset, which contains movie reviews and ratings. Evaluation data:

Test data from Rotten Tomatoes: The model's evaluation was performed using the test set of the Rotten Tomatoes dataset to assess its ability to generalize to unseen movie reviews.

Improvement Metrics after fine-tuning on Rotten Tomatoes:

Accuracy increased from 82.1% to 84.3%.
Precision improved from 81.61% to 84.81%.
F1 Score saw a boost from 83.62% to 85.37%.
Loss decreased from 0.4268 to 0.3621.
Runtime was reduced from 17.7 seconds to 15.61 seconds.
The model's throughput improved, with samples per second increasing from 56.49 to 64.05, and steps per second from 7.06 to 8.01.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training results

As we can see, after Epoch 1.0 the Loss goes up indicating overfitting and thus the best model of Epock 1.0 (checkpoint-534) is pushed on the hub.

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.365	1.0	534	0.3682	0.8396	0.8267	0.8593	0.8427
0.2804	2.0	1068	0.3892	0.8452	0.8525	0.8349	0.8436
0.2301	3.0	1602	0.4342	0.8443	0.8404	0.8499	0.8451

Framework versions

Transformers 4.46.2
Pytorch 2.5.1+cu121
Datasets 3.1.0
Tokenizers 0.20.3

tashrifmahmud
/

sentiment_analysis_model_v2