Encoders
Collection
4 items
•
Updated
Pretrained bidirectional encoder for russian language.
The model was trained using standard MLM objective on large text corpora including open social data.
See Training Details
section for more information.
⚠️ This model contains only the encoder part without any pretrained head.
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("deepvk/deberta-v1-base")
model = AutoModel.from_pretrained("deepvk/deberta-v1-base")
text = "Привет, мир!"
inputs = tokenizer(text, return_tensors='pt')
predictions = model(**inputs)
400 GB of filtered and deduplicated texts in total. A mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles, News websites, and Social corpus.
Argument | Value |
---|---|
Training regime | fp16 mixed precision |
Optimizer | AdamW |
Adam betas | 0.9,0.98 |
Adam eps | 1e-6 |
Weight decay | 1e-2 |
Batch size | 2240 |
Num training steps | 1kk |
Num warm-up steps | 10k |
LR scheduler | Linear |
LR | 2e-5 |
Gradient norm | 1.0 |
The model was trained on a machine with 8xA100 for approximately 30 days.
Argument | Value |
---|---|
Encoder layers | 12 |
Encoder attention heads | 12 |
Encoder embed dim | 768 |
Encoder ffn embed dim | 3,072 |
Activation function | GeLU |
Attention dropout | 0.1 |
Dropout | 0.1 |
Max positions | 512 |
Vocab size | 50266 |
Tokenizer type | Byte-level BPE |
We evaluated the model on Russian Super Glue dev set. The best result in each task is marked in bold. All models have the same size except the distilled version of DeBERTa.
Model | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Score |
---|---|---|---|---|---|---|---|---|
vk-deberta-distill | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
vk-roberta-base | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
vk-deberta-base | 0.450 | 0.61 | 0.722 | 0.704 | 0.948 | 0.578 | 0.76 | 0.682 |
vk-bert-base | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 | 0.583 | 0.737 | 0.657 |
sber-bert-base | 0.491 | 0.61 | 0.663 | 0.769 | 0.962 | 0.574 | 0.678 | 0.678 |