|
--- |
|
license: mit |
|
language: |
|
- tr |
|
pipeline_tag: text-classification |
|
tags: |
|
- text-classification |
|
--- |
|
|
|
## Model Description |
|
This model has been fine-tuned using [dbmdz/bert-base-turkish-128k-uncased](https://huggingface.co/dbmdz/bert-base-turkish-128k-uncased) model. |
|
|
|
This model created for detecting gibberish sentences like "adssnfjnfjn" . |
|
It is a simple binary classification project that gives sentence is gibberish or real. |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") |
|
model = AutoModelForSequenceClassification.from_pretrained("TURKCELL/gibberish-detection-model-tr") |
|
tokenizer = AutoTokenizer.from_pretrained("TURKCELL/gibberish-detection-model-tr", do_lower_case=True, use_fast=True) |
|
|
|
model.to(device) |
|
|
|
def get_result_for_one_sample(model, tokenizer, device, sample): |
|
d = { |
|
1: 'gibberish', |
|
0: 'real' |
|
} |
|
test_sample = tokenizer([sample], padding=True, truncation=True, max_length=256, return_tensors='pt').to(device) |
|
# test_sample |
|
output = model(**test_sample) |
|
y_pred = np.argmax(output.logits.detach().to('cpu').numpy(), axis=1) |
|
return d[y_pred[0]] |
|
|
|
sentence = "nabeer rdahdaajdajdnjnjf" |
|
result = get_result_for_one_sample(model, tokenizer, device, sentence) |
|
print(result) |
|
|
|
``` |
|
|
|
|