File size: 1,351 Bytes
929afef
 
a174d1c
 
 
 
 
929afef
a174d1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: mit
language:
- tr
pipeline_tag: text-classification
tags:
- text-classification
---

## Model Description
This model has been fine-tuned using [dbmdz/bert-base-turkish-128k-uncased](https://huggingface.co/dbmdz/bert-base-turkish-128k-uncased) model. 

This model created for detecting gibberish sentences like "adssnfjnfjn" .
It is a simple binary classification project that gives sentence is gibberish or real.

## Usage

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForSequenceClassification.from_pretrained("TURKCELL/gibberish-detection-model-tr")
tokenizer = AutoTokenizer.from_pretrained("TURKCELL/gibberish-detection-model-tr", do_lower_case=True, use_fast=True)

model.to(device)

def get_result_for_one_sample(model, tokenizer, device, sample):
    d = {
        1: 'gibberish',
        0: 'real'
    }
    test_sample = tokenizer([sample], padding=True, truncation=True, max_length=256, return_tensors='pt').to(device)
    # test_sample
    output = model(**test_sample)
    y_pred = np.argmax(output.logits.detach().to('cpu').numpy(), axis=1)
    return d[y_pred[0]]

sentence = "nabeer rdahdaajdajdnjnjf"
result = get_result_for_one_sample(model, tokenizer, device, sentence)
print(result)

```