File size: 360 Bytes
46d0631
fc8b6a0
 
 
 
 
46d0631
 
 
 
 
403fa50
 
 
 
fc8b6a0
403fa50
fc8b6a0
403fa50
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
language:
- kbd
- ru
- multilingual
license: unknown
tags:
- circassian
- kabardian
datasets:
- anzorq/kbd_lat-835k_ru-3M
---

t5-v1_1-small pretrained with mlm task on

� kbd (custom latin script) 835K lines: a pile of scraped text from news sites, books etc.

� ru 3M lines: wiki corpus from OPUS


tokenizer: sentencepiece unigram, 8K, shared vocabulary