Edit model card

rugpt3small_based_on_gpt2 safetensors variant

Model was trained with sequence length 1024 using transformers by SberDevices team on 80B tokens around 3 epoch. After that model was finetuned on 2048 context.

Total training time took around one week on 32 GPUs.

Authors

Downloads last month
14
Safetensors
Model size
125M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).