File size: 4,051 Bytes
da56149 5149d19 505b6e7 dbb87b7 da56149 5149d19 15c6349 da56149 505b6e7 5149d19 15c6349 5149d19 80fb79e 5149d19 80fb79e 5149d19 225b7a5 5149d19 505b6e7 555f60a 505b6e7 5149d19 15c6349 505b6e7 15c6349 505b6e7 15c6349 505b6e7 15c6349 505b6e7 5149d19 6c7c44c 3c31c29 cba5b62 5149d19 e06ff5b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
language: tr
tags:
- roberta
- language-model
- scientific
- turkish
- fill-mask
license: mit
model_author: Serdar ÇAĞLAR
widget:
- text: "Sekiz hastada <mask> çıkarılması gerekti (n=5 kateter trombozu, n=2 kateter enfeksiyonu ve n=1 büyük hematom)."
- text: "Santral <mask> kateterizasyona bağlı süperior vena kava perforasyonun video yardımlı torakoskopik cerrahi ile tedavisi"
- text: Akut lenfoblastik <mask> tanısı sırasında yapılan genetik çalışmalar, tedavi yoğunluğunun belirlenmesini ve hedefe yönelik tedavilerin planlanmasını sağlar."
---
🇹🇷
# Roberta-Based Language Model Trained on Turkish Scientific Article Abstracts
This model is a powerful natural language processing model trained on Turkish scientific article abstracts. It focuses on scientific content in the Turkish language and excels in tasks related to text comprehension. The model can be used for understanding scientific texts, summarization, and various other natural language processing tasks.
## Model Details
- **Data Source**: This model is trained on a custom Turkish scientific article summaries dataset. The data was collected from various sources in Turkey, including databases like "trdizin," "yöktez," and "t.k."
- **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
- **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking the text into subword tokens.
- **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned 2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.
## Usage
Load transformers library with:
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("serdarcaglar/roberta-base-turkish-scientific-abstract")
model = AutoModelForMaskedLM.from_pretrained("serdarcaglar/roberta-base-turkish-scientific-abstract")
```
# Fill Mask Usage
```python
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model="serdarcaglar/roberta-base-turkish-scientific-abstract",
tokenizer="serdarcaglar/roberta-base-turkish-scientific-abstract"
)
fill_mask("İnterarteriyel seyirli anormal <mask> arter hastaları ne zaman ameliyat edilmeli ve hangi cerrahi teknik kullanılmalıdır?")
[{'score': 0.7180812954902649,
'token': 6252,
'token_str': ' koroner',
'sequence': 'İnterarteriyel seyirli anormal koroner arter hastaları ne zaman ameliyat edilmeli ve hangi cerrahi teknik kullanılmalıdır?'},
{'score': 0.09322144836187363,
'token': 9978,
'token_str': ' pulmoner',
'sequence': 'İnterarteriyel seyirli anormal pulmoner arter hastaları ne zaman ameliyat edilmeli ve hangi cerrahi teknik kullanılmalıdır?'},
{'score': 0.03268029913306236,
'token': 16407,
'token_str': ' uterin',
'sequence': 'İnterarteriyel seyirli anormal uterin arter hastaları ne zaman ameliyat edilmeli ve hangi cerrahi teknik kullanılmalıdır?'},
{'score': 0.012145915068686008,
'token': 12969,
'token_str': ' renal',
'sequence': 'İnterarteriyel seyirli anormal renal arter hastaları ne zaman ameliyat edilmeli ve hangi cerrahi teknik kullanılmalıdır?'},
{'score': 0.011508156545460224,
'token': 26256,
'token_str': ' karotis',
'sequence': 'İnterarteriyel seyirli anormal karotis arter hastaları ne zaman ameliyat edilmeli ve hangi cerrahi teknik kullanılmalıdır?'}]
```
## Disclaimer
The use of this model is subject to compliance with specific copyright and legal regulations, which are the responsibility of the users. The model owner or provider cannot be held liable for any issues arising from the use of the model.
### Contact information
For further information, send an email to <serdarildercaglar@gmail.com>
[Serdar ÇAĞLAR](https://www.linkedin.com/in/serdarildercaglar/).
|