Edit model card

Model overview

This model is the baseline model for awesome-japanese-nlp-classification-dataset. It was trained on this dataset, saved using the development data, and evaluated using the test data. The following table shows the evaluation results.

Label Precision Recall F1-Score Support
0 0.98 0.99 0.98 796
1 0.79 0.70 0.74 60
Accuracy 0.97 856
Macro Avg 0.89 0.84 0.86 856
Weighted Avg 0.96 0.97 0.97 856

Usage

Please install the following library.

pip install transformers

You can easily use a classification model with the pipeline method.

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="taishi-i/awesome-japanese-nlp-classification-model",
)

# Relevant sample
text = "ディープラーニングによる自然言語処理(共立出版)のサポートページです"
label = pipe(text)
print(label) # [{'label': '1', 'score': 0.9910495281219482}]

# Not Relevant sample
text = "AIイラストを管理するデスクトップアプリ"
label = pipe(text)
print(label) # [{'label': '0', 'score': 0.9986791014671326}]

Evaluation

Please install the following library.

pip install evaluate scikit-learn datasets transformers torch
import evaluate
from datasets import load_dataset
from sklearn.metrics import classification_report
from transformers import pipeline

# Evaluation dataset
dataset = load_dataset("taishi-i/awesome-japanese-nlp-classification-dataset")

# Text classification model
pipe = pipeline(
    "text-classification",
    model="taishi-i/awesome-japanese-nlp-classification-model",
)

# Evaluation metric
f1 = evaluate.load("f1")

# Predict process
predicted_labels = []
for text in dataset["test"]["text"]:
    prediction = pipe(text)
    predicted_label = prediction[0]["label"]
    predicted_labels.append(int(predicted_label))

score = f1.compute(
    predictions=predicted_labels, references=dataset["test"]["label"]
)
print(score)

report = classification_report(
    y_true=dataset["test"]["label"], y_pred=predicted_labels
)
print(report)

License

This model was trained from a dataset collected from the GitHub API under GitHub Acceptable Use Policies - 7. Information Usage Restrictions and GitHub Terms of Service - H. API Terms. It should be used solely for research verification purposes. Adhering to GitHub's regulations is mandatory.

Downloads last month
5
Safetensors
Model size
178M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Dataset used to train taishi-i/awesome-japanese-nlp-classification-model