salti commited on
Commit
887b7a5
1 Parent(s): e4555b1

Fix typos in README

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -24,9 +24,9 @@ The model could only be trained for about `10%` of the whole dataset due to time
24
 
25
  ## Preprocessing and the tokenizer
26
 
27
- We tried to keep the preprocessing to the bare minimum. We ony replaced URLs, emails and social media user mentions with fixed tokens.
28
 
29
- Contrary to other pretrained Arabic LMs, we decided to not strip the Arabic diacritics and to keep them in the vocabulary.
30
 
31
  The tokenizer was trained on `5%` of the training set, with a vocabulary size of `64'000`.
32
 
 
24
 
25
  ## Preprocessing and the tokenizer
26
 
27
+ We tried to keep the preprocessing to a bare minimum. We only replaced URLs, emails and social media user mentions with fixed tokens.
28
 
29
+ Contrary to other pretrained Arabic LMs, we decided to not strip the Arabic diacritics and to keep them part of the vocabulary.
30
 
31
  The tokenizer was trained on `5%` of the training set, with a vocabulary size of `64'000`.
32