YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
RoBERTa Pretrained on Smaller Datasets
We pretrain RoBERTa on smaller datasets (1M, 10M, 100M, 1B tokens). We release 3 models with lowest perplexities for each pretraining data size out of 25 runs (or 10 in the case of 1B tokens). The pretraining data reproduces that of BERT: We combine English Wikipedia and a reproduction of BookCorpus using texts from smashwords in a ratio of approximately 3:1.
Hyperparameters and Validation Perplexity
The hyperparameters and validation perplexities corresponding to each model are as follows:
Model Name | Training Size | Model Size | Max Steps | Batch Size | Validation Perplexity |
---|---|---|---|---|---|
roberta-base-1B-1 | 1B | BASE | 100K | 512 | 3.93 |
roberta-base-1B-2 | 1B | BASE | 31K | 1024 | 4.25 |
roberta-base-1B-3 | 1B | BASE | 31K | 4096 | 3.84 |
roberta-base-100M-1 | 100M | BASE | 100K | 512 | 4.99 |
roberta-base-100M-2 | 100M | BASE | 31K | 1024 | 4.61 |
roberta-base-100M-3 | 100M | BASE | 31K | 512 | 5.02 |
roberta-base-10M-1 | 10M | BASE | 10K | 1024 | 11.31 |
roberta-base-10M-2 | 10M | BASE | 10K | 512 | 10.78 |
roberta-base-10M-3 | 10M | BASE | 31K | 512 | 11.58 |
roberta-med-small-1M-1 | 1M | MED-SMALL | 100K | 512 | 153.38 |
roberta-med-small-1M-2 | 1M | MED-SMALL | 10K | 512 | 134.18 |
roberta-med-small-1M-3 | 1M | MED-SMALL | 31K | 512 | 139.39 |
The hyperparameters corresponding to model sizes mentioned above are as follows:
Model Size | L | AH | HS | FFN | P |
---|---|---|---|---|---|
BASE | 12 | 12 | 768 | 3072 | 125M |
MED-SMALL | 6 | 8 | 512 | 2048 | 45M |
(AH = number of attention heads; HS = hidden size; FFN = feedforward network dimension; P = number of parameters.)
For other hyperparameters, we select:
- Peak Learning rate: 5e-4
- Warmup Steps: 6% of max steps
- Dropout: 0.1
- Downloads last month
- 1,275
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.