metadata
license: mit
language:
- tr
pipeline_tag: text-classification
tags:
- text-classification
Model Description
This model has been fine-tuned using dbmdz/bert-base-turkish-128k-uncased model.
This model created for detecting gibberish sentences like "adssnfjnfjn" . It is a simple binary classification project that gives sentence is gibberish or real.
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForSequenceClassification.from_pretrained("TURKCELL/gibberish-detection-model-tr")
tokenizer = AutoTokenizer.from_pretrained("TURKCELL/gibberish-detection-model-tr", do_lower_case=True, use_fast=True)
model.to(device)
def get_result_for_one_sample(model, tokenizer, device, sample):
d = {
1: 'gibberish',
0: 'real'
}
test_sample = tokenizer([sample], padding=True, truncation=True, max_length=256, return_tensors='pt').to(device)
# test_sample
output = model(**test_sample)
y_pred = np.argmax(output.logits.detach().to('cpu').numpy(), axis=1)
return d[y_pred[0]]
sentence = "nabeer rdahdaajdajdnjnjf"
result = get_result_for_one_sample(model, tokenizer, device, sentence)
print(result)