metinovadilet
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
- kyrgyz
|
9 |
- tokenizer
|
10 |
---
|
11 |
-
A tokenizer tailored for the Kyrgyz language, utilizing SentencePiece with Byte Pair Encoding (BPE) to offer efficient and precise tokenization. It features a
|
12 |
Features:
|
13 |
|
14 |
Language: Kyrgyz
|
|
|
8 |
- kyrgyz
|
9 |
- tokenizer
|
10 |
---
|
11 |
+
A tokenizer tailored for the Kyrgyz language, utilizing SentencePiece with Byte Pair Encoding (BPE) to offer efficient and precise tokenization. It features a 100,000-subword vocabulary, ensuring optimal performance for various Kyrgyz NLP tasks. This tokenizer was developed in collaboration with UlutSoft LLC to reflect authentic Kyrgyz language usage.
|
12 |
Features:
|
13 |
|
14 |
Language: Kyrgyz
|