File size: 1,368 Bytes
669cc30
52df02d
 
def17a5
 
 
 
 
 
 
 
 
 
 
52df02d
def17a5
52df02d
 
 
 
 
 
 
 
 
 
 
 
45b4556
 
 
52df02d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
def17a5
669cc30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
datasets:
- wikimovies

language: 
- English

thumbnail: 

tags:
- roberta
- roberta-base
- masked-language-modeling 

license: cc-by-4.0

---
# roberta-base for MLM 

```
model_name = "thatdramebaazguy/roberta-base-wikimovies"
pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="Fill-Mask")
```
## Overview
**Language model:** roberta-base  
**Language:** English  
**Downstream-task:** Fill-Mask  
**Training data:** wikimovies  
**Eval data:** wikimovies  
**Infrastructure**: 2x Tesla v100   
**Code:**  See [example](https://github.com/adityaarunsinghal/Domain-Adaptation/blob/master/shell_scripts/train_movie_roberta.sh)    

## Hyperparameters
```
num_examples = 4346
batch_size = 16
n_epochs = 3
base_LM_model = "roberta-base"
learning_rate = 5e-05
max_query_length=64
Gradient Accumulation steps = 1
Total optimization steps = 816
evaluation_strategy=IntervalStrategy.NO
prediction_loss_only=False
per_device_train_batch_size=8
per_device_eval_batch_size=8
adam_beta1=0.9
adam_beta2=0.999
adam_epsilon=1e-08,
max_grad_norm=1.0
lr_scheduler_type=SchedulerType.LINEAR
warmup_ratio=0.0
seed=42
eval_steps=500
metric_for_best_model=None
greater_is_better=None
label_smoothing_factor=0.0
``` 
## Performance

perplexity = 4.3808

Some of my work: 
- [Domain-Adaptation Project](https://github.com/adityaarunsinghal/Domain-Adaptation/)

---