Model overview
This model is the baseline model for awesome-japanese-nlp-classification-dataset. It was trained on this dataset, saved using the development data, and evaluated using the test data. The following table shows the evaluation results.
Label | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
0 | 0.98 | 0.99 | 0.98 | 796 |
1 | 0.79 | 0.70 | 0.74 | 60 |
Accuracy | 0.97 | 856 | ||
Macro Avg | 0.89 | 0.84 | 0.86 | 856 |
Weighted Avg | 0.96 | 0.97 | 0.97 | 856 |
Usage
Please install the following library.
pip install transformers
You can easily use a classification model with the pipeline method.
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="taishi-i/awesome-japanese-nlp-classification-model",
)
# Relevant sample
text = "ディープラーニングによる自然言語処理(共立出版)のサポートページです"
label = pipe(text)
print(label) # [{'label': '1', 'score': 0.9910495281219482}]
# Not Relevant sample
text = "AIイラストを管理するデスクトップアプリ"
label = pipe(text)
print(label) # [{'label': '0', 'score': 0.9986791014671326}]
Evaluation
Please install the following library.
pip install evaluate scikit-learn datasets transformers torch
import evaluate
from datasets import load_dataset
from sklearn.metrics import classification_report
from transformers import pipeline
# Evaluation dataset
dataset = load_dataset("taishi-i/awesome-japanese-nlp-classification-dataset")
# Text classification model
pipe = pipeline(
"text-classification",
model="taishi-i/awesome-japanese-nlp-classification-model",
)
# Evaluation metric
f1 = evaluate.load("f1")
# Predict process
predicted_labels = []
for text in dataset["test"]["text"]:
prediction = pipe(text)
predicted_label = prediction[0]["label"]
predicted_labels.append(int(predicted_label))
score = f1.compute(
predictions=predicted_labels, references=dataset["test"]["label"]
)
print(score)
report = classification_report(
y_true=dataset["test"]["label"], y_pred=predicted_labels
)
print(report)
License
This model was trained from a dataset collected from the GitHub API under GitHub Acceptable Use Policies - 7. Information Usage Restrictions and GitHub Terms of Service - H. API Terms. It should be used solely for research verification purposes. Adhering to GitHub's regulations is mandatory.
- Downloads last month
- 27
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.