Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/savasy/bert-turkish-text-classification/README.md
README.md
ADDED
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: tr
|
3 |
+
---
|
4 |
+
|
5 |
+
# Turkish Text Classification
|
6 |
+
|
7 |
+
This model is a fine-tune model of https://github.com/stefan-it/turkish-bert by using text classification data where there are 7 categories as follows
|
8 |
+
|
9 |
+
```
|
10 |
+
code_to_label={
|
11 |
+
'LABEL_0': 'dunya ',
|
12 |
+
'LABEL_1': 'ekonomi ',
|
13 |
+
'LABEL_2': 'kultur ',
|
14 |
+
'LABEL_3': 'saglik ',
|
15 |
+
'LABEL_4': 'siyaset ',
|
16 |
+
'LABEL_5': 'spor ',
|
17 |
+
'LABEL_6': 'teknoloji '}
|
18 |
+
|
19 |
+
```
|
20 |
+
|
21 |
+
|
22 |
+
## Data
|
23 |
+
The following Turkish benchmark dataset is used for fine-tuning
|
24 |
+
|
25 |
+
https://www.kaggle.com/savasy/ttc4900
|
26 |
+
|
27 |
+
## Quick Start
|
28 |
+
|
29 |
+
Bewgin with installing transformers as follows
|
30 |
+
> pip install transformers
|
31 |
+
|
32 |
+
```
|
33 |
+
# Code:
|
34 |
+
# import libraries
|
35 |
+
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer, AutoModelForSequenceClassification
|
36 |
+
tokenizer= AutoTokenizer.from_pretrained("savasy/bert-turkish-text-classification")
|
37 |
+
|
38 |
+
# build and load model, it take time depending on your internet connection
|
39 |
+
model= AutoModelForSequenceClassification.from_pretrained("savasy/bert-turkish-text-classification")
|
40 |
+
|
41 |
+
# make pipeline
|
42 |
+
nlp=pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
|
43 |
+
|
44 |
+
# apply model
|
45 |
+
nlp("bla bla")
|
46 |
+
# [{'label': 'LABEL_2', 'score': 0.4753005802631378}]
|
47 |
+
|
48 |
+
code_to_label={
|
49 |
+
'LABEL_0': 'dunya ',
|
50 |
+
'LABEL_1': 'ekonomi ',
|
51 |
+
'LABEL_2': 'kultur ',
|
52 |
+
'LABEL_3': 'saglik ',
|
53 |
+
'LABEL_4': 'siyaset ',
|
54 |
+
'LABEL_5': 'spor ',
|
55 |
+
'LABEL_6': 'teknoloji '}
|
56 |
+
|
57 |
+
code_to_label[nlp("bla bla")[0]['label']]
|
58 |
+
# > 'kultur '
|
59 |
+
```
|
60 |
+
|
61 |
+
## How the model was trained
|
62 |
+
|
63 |
+
```
|
64 |
+
|
65 |
+
## loading data for Turkish text classification
|
66 |
+
import pandas as pd
|
67 |
+
# https://www.kaggle.com/savasy/ttc4900
|
68 |
+
df=pd.read_csv("7allV03.csv")
|
69 |
+
df.columns=["labels","text"]
|
70 |
+
df.labels=pd.Categorical(df.labels)
|
71 |
+
|
72 |
+
traind_df=...
|
73 |
+
eval_df=...
|
74 |
+
|
75 |
+
# model
|
76 |
+
from simpletransformers.classification import ClassificationModel
|
77 |
+
import torch,sklearn
|
78 |
+
|
79 |
+
model_args = {
|
80 |
+
"use_early_stopping": True,
|
81 |
+
"early_stopping_delta": 0.01,
|
82 |
+
"early_stopping_metric": "mcc",
|
83 |
+
"early_stopping_metric_minimize": False,
|
84 |
+
"early_stopping_patience": 5,
|
85 |
+
"evaluate_during_training_steps": 1000,
|
86 |
+
"fp16": False,
|
87 |
+
"num_train_epochs":3
|
88 |
+
}
|
89 |
+
|
90 |
+
model = ClassificationModel(
|
91 |
+
"bert",
|
92 |
+
"dbmdz/bert-base-turkish-cased",
|
93 |
+
use_cuda=cuda_available,
|
94 |
+
args=model_args,
|
95 |
+
num_labels=7
|
96 |
+
)
|
97 |
+
model.train_model(train_df, acc=sklearn.metrics.accuracy_score)
|
98 |
+
```
|
99 |
+
For other training models please check https://simpletransformers.ai/
|
100 |
+
|
101 |
+
|
102 |
+
For the detailed usage of Turkish Text Classification please check [python notebook](https://github.com/savasy/TurkishTextClassification/blob/master/Bert_base_Text_Classification_for_Turkish.ipynb)
|