serdarcaglar
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -20,11 +20,11 @@ This model is a powerful natural language processing model trained on Turkish sc
|
|
20 |
|
21 |
## Model Details
|
22 |
|
23 |
-
- **Data Source**: This model is trained on a custom
|
24 |
|
25 |
- **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
|
26 |
|
27 |
-
- **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking
|
28 |
|
29 |
- **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned 2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.
|
30 |
|
|
|
20 |
|
21 |
## Model Details
|
22 |
|
23 |
+
- **Data Source**: This model is trained on a custom Turkish scientific article summaries dataset. The data was collected from various sources in Turkey, including databases like "trdizin," "yöktez," and "t.k."
|
24 |
|
25 |
- **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
|
26 |
|
27 |
+
- **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking the text into subword tokens.
|
28 |
|
29 |
- **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned 2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.
|
30 |
|