File size: 1,368 Bytes
669cc30 52df02d def17a5 52df02d def17a5 52df02d 45b4556 52df02d def17a5 669cc30 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
datasets:
- wikimovies
language:
- English
thumbnail:
tags:
- roberta
- roberta-base
- masked-language-modeling
license: cc-by-4.0
---
# roberta-base for MLM
```
model_name = "thatdramebaazguy/roberta-base-wikimovies"
pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="Fill-Mask")
```
## Overview
**Language model:** roberta-base
**Language:** English
**Downstream-task:** Fill-Mask
**Training data:** wikimovies
**Eval data:** wikimovies
**Infrastructure**: 2x Tesla v100
**Code:** See [example](https://github.com/adityaarunsinghal/Domain-Adaptation/blob/master/shell_scripts/train_movie_roberta.sh)
## Hyperparameters
```
num_examples = 4346
batch_size = 16
n_epochs = 3
base_LM_model = "roberta-base"
learning_rate = 5e-05
max_query_length=64
Gradient Accumulation steps = 1
Total optimization steps = 816
evaluation_strategy=IntervalStrategy.NO
prediction_loss_only=False
per_device_train_batch_size=8
per_device_eval_batch_size=8
adam_beta1=0.9
adam_beta2=0.999
adam_epsilon=1e-08,
max_grad_norm=1.0
lr_scheduler_type=SchedulerType.LINEAR
warmup_ratio=0.0
seed=42
eval_steps=500
metric_for_best_model=None
greater_is_better=None
label_smoothing_factor=0.0
```
## Performance
perplexity = 4.3808
Some of my work:
- [Domain-Adaptation Project](https://github.com/adityaarunsinghal/Domain-Adaptation/)
---
|