zeynepgulhan's picture
Update README.md
a174d1c verified
|
raw
history blame
1.35 kB
metadata
license: mit
language:
  - tr
pipeline_tag: text-classification
tags:
  - text-classification

Model Description

This model has been fine-tuned using dbmdz/bert-base-turkish-128k-uncased model.

This model created for detecting gibberish sentences like "adssnfjnfjn" . It is a simple binary classification project that gives sentence is gibberish or real.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForSequenceClassification.from_pretrained("TURKCELL/gibberish-detection-model-tr")
tokenizer = AutoTokenizer.from_pretrained("TURKCELL/gibberish-detection-model-tr", do_lower_case=True, use_fast=True)

model.to(device)

def get_result_for_one_sample(model, tokenizer, device, sample):
    d = {
        1: 'gibberish',
        0: 'real'
    }
    test_sample = tokenizer([sample], padding=True, truncation=True, max_length=256, return_tensors='pt').to(device)
    # test_sample
    output = model(**test_sample)
    y_pred = np.argmax(output.logits.detach().to('cpu').numpy(), axis=1)
    return d[y_pred[0]]

sentence = "nabeer rdahdaajdajdnjnjf"
result = get_result_for_one_sample(model, tokenizer, device, sentence)
print(result)