File size: 1,351 Bytes
929afef a174d1c 929afef a174d1c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
license: mit
language:
- tr
pipeline_tag: text-classification
tags:
- text-classification
---
## Model Description
This model has been fine-tuned using [dbmdz/bert-base-turkish-128k-uncased](https://huggingface.co/dbmdz/bert-base-turkish-128k-uncased) model.
This model created for detecting gibberish sentences like "adssnfjnfjn" .
It is a simple binary classification project that gives sentence is gibberish or real.
## Usage
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForSequenceClassification.from_pretrained("TURKCELL/gibberish-detection-model-tr")
tokenizer = AutoTokenizer.from_pretrained("TURKCELL/gibberish-detection-model-tr", do_lower_case=True, use_fast=True)
model.to(device)
def get_result_for_one_sample(model, tokenizer, device, sample):
d = {
1: 'gibberish',
0: 'real'
}
test_sample = tokenizer([sample], padding=True, truncation=True, max_length=256, return_tensors='pt').to(device)
# test_sample
output = model(**test_sample)
y_pred = np.argmax(output.logits.detach().to('cpu').numpy(), axis=1)
return d[y_pred[0]]
sentence = "nabeer rdahdaajdajdnjnjf"
result = get_result_for_one_sample(model, tokenizer, device, sentence)
print(result)
```
|