distilbert-imdb / README.md
3oclock's picture
Update README.md
93d7843 verified
metadata
library_name: transformers
datasets:
  - stanfordnlp/imdb
metrics:
  - accuracy
tags:
  - PyTorch
model-index:
  - name: distilbert-imdb
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: imdb
          type: imdb
          args: plain_text
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9316
pipeline_tag: text-classification
license: apache-2.0
language:
  - en

distilbert-imdb

This is a fine-tuned version of distilbert-base-uncased on imdb dataset.

Performance

  • Loss: 0.1958
  • Accuracy: 0.932

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import pipeline,DistilBertTokenizer

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
classifier = pipeline("sentiment-analysis", model="3oclock/distilbert-imdb", tokenizer=tokenizer)
result = classifier("I love this movie!")
print(result)

Model Details

Model Description

This is the model card for a fine-tuned 🤗 transformers model on the IMDb dataset.

  • Developed by: Ge Li
  • Model type: DistilBERT for Sequence Classification
  • Language(s) (NLP): English
  • License: [Specify License, e.g., Apache 2.0]
  • Finetuned from model: distilbert-base-uncased

Uses

Direct Use

This model can be used directly for sentiment analysis on movie reviews. It is best suited for classifying English-language text that is similar in nature to movie reviews.

Downstream Use [optional]

This model can be fine-tuned on other sentiment analysis tasks or adapted for tasks like text classification in domains similar to IMDb movie reviews.

Out-of-Scope Use

The model may not perform well on non-English text or text that is significantly different in style and content from the IMDb dataset (e.g., technical documents, social media posts).

Bias, Risks, and Limitations

Bias

The IMDb dataset primarily consists of English-language movie reviews and may not generalize well to other languages or types of reviews.

Risks

Misclassification in sentiment analysis can lead to incorrect conclusions in applications relying on this model.

Limitations

The model was trained on a dataset of movie reviews, so it may not perform as well on other types of text data.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model.