Seymasa's picture
Update README.md
31f28cd verified
metadata
license: apache-2.0
language:
  - tr
pipeline_tag: text-classification
tags:
  - job advertisement
  - turkish bert
  - bert-based
  - StratifiedKFold

language: - tr tags: - translation license: apache-2.0

About the model

It has been trained with 15451 real job advertisement data.

Included classes;

  • Uygun İlan
  • Is Ilani Degil
  • Mustehcen
  • Cift Pozisyon

Accordingly, the success rates in education are as follows;

  • Model is Turkish bert-based.
  • Used StratifiedKFold(5) for validation.
  • results [0.806858621805241, 0.8912621359223301, 0.9440129449838188, 0.9750809061488673, 0.9851132686084142]

Mean-Precision: 0.9204655754937342

Uygun İlan Is Ilani Degil Mustehcen Cift Pozisyon
Precision 0.986 0.996 0.966 0.970
Recall 0.992 0.986 0.966 0.959
F1 Score 0.989 0.991 0.966 0.965
Accuracy : 0.975

Example

!IMPORTANT_HINT: The sentence given to pipe must not contain Turkish characters.

from transformers import AutoTokenizer, TextClassificationPipeline, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("nanelimon/bert-base-turkish-job-advertisement")
model = AutoModelForSequenceClassification.from_pretrained("nanelimon/bert-base-turkish-job-advertisement")
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)


def set_sentence(sentence: str):
    result = sentence.lower().replace('ö', 'o').replace('ı', 'i').replace('ü', 'u').replace('ç', 'c').replace('ğ', 'g').replace('ş', 's')
    return result


print(pipe(set_sentence('Fiziği düzgün 17 yaş kızlar aranıyor')))

Result;

output: [{'label': 'Mustehcen', 'score': 0.9992677569389343}]
  • label= It shows which class the sent Turkish text belongs to according to the model.
  • score= It shows the compliance rate of the Turkish text sent to the label found.

Authors

License

apache-2.0

Free Software, Hell Yeah!