serdarcaglar commited on
Commit
80fb79e
·
verified ·
1 Parent(s): b92d5e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -20,11 +20,11 @@ This model is a powerful natural language processing model trained on Turkish sc
20
 
21
  ## Model Details
22
 
23
- - **Data Source**: This model is trained on a custom dataset consisting of Turkish scientific article summaries. The data was collected using web scraping methods from various sources in Turkey, including databases like "trdizin," "yöktez," and "türkiyeklinikleri."
24
 
25
  - **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
26
 
27
- - **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking down the text into subword tokens.
28
 
29
  - **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned 2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.
30
 
 
20
 
21
  ## Model Details
22
 
23
+ - **Data Source**: This model is trained on a custom Turkish scientific article summaries dataset. The data was collected from various sources in Turkey, including databases like "trdizin," "yöktez," and "t.k."
24
 
25
  - **Dataset Preprocessing**: The data underwent preprocessing to facilitate better learning. Texts were segmented into sentences, and improperly divided sentences were cleaned. The texts were processed meticulously.
26
 
27
+ - **Tokenizer**: The model utilizes a BPE (Byte Pair Encoding) tokenizer to process the data effectively, breaking the text into subword tokens.
28
 
29
  - **Training Details**: The model was trained on a large dataset of Turkish sentences. The training spanned 2M Steps, totaling 3+ days, and the model was built from scratch. No fine-tuning was applied.
30