kazandaev commited on
Commit
c628253
1 Parent(s): 1fcfed2

Training complete

Browse files
Files changed (1) hide show
  1. README.md +22 -29
README.md CHANGED
@@ -1,41 +1,26 @@
1
  ---
2
  license: mit
3
- base_model: kazandaev/m2m100_418M
4
  tags:
5
  - translation
6
  - generated_from_trainer
7
- datasets:
8
- - wmt16
9
  metrics:
10
  - bleu
11
  model-index:
12
- - name: m2m100_418M
13
- results:
14
- - task:
15
- name: Sequence-to-sequence Language Modeling
16
- type: text2text-generation
17
- dataset:
18
- name: wmt16
19
- type: wmt16
20
- config: ru-en
21
- split: validation
22
- args: ru-en
23
- metrics:
24
- - name: Bleu
25
- type: bleu
26
- value: 32.0585
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
  should probably proofread and complete it, then remove this comment. -->
31
 
32
- # m2m100_418M
33
 
34
- This model is a fine-tuned version of [kazandaev/m2m100_418M](https://huggingface.co/kazandaev/m2m100_418M) on the wmt16 dataset.
35
  It achieves the following results on the evaluation set:
36
- - Loss: 0.8954
37
- - Bleu: 32.0585
38
- - Gen Len: 36.1643
39
 
40
  ## Model description
41
 
@@ -56,20 +41,28 @@ More information needed
56
  The following hyperparameters were used during training:
57
  - learning_rate: 2e-05
58
  - train_batch_size: 4
59
- - eval_batch_size: 2
60
  - seed: 42
61
- - gradient_accumulation_steps: 10
62
- - total_train_batch_size: 40
63
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
64
  - lr_scheduler_type: linear
65
- - num_epochs: 2
66
 
67
  ### Training results
68
 
69
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
70
  |:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|
71
- | 0.8087 | 1.0 | 47790 | 0.9542 | 30.786 | 36.1469 |
72
- | 0.7266 | 2.0 | 95580 | 0.8954 | 32.0585 | 36.1643 |
 
 
 
 
 
 
 
 
73
 
74
 
75
  ### Framework versions
 
1
  ---
2
  license: mit
3
+ base_model: facebook/m2m100_418M
4
  tags:
5
  - translation
6
  - generated_from_trainer
 
 
7
  metrics:
8
  - bleu
9
  model-index:
10
+ - name: m2m100_418M-finetuned-en-ru
11
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
+ # m2m100_418M-finetuned-en-ru
18
 
19
+ This model is a fine-tuned version of [facebook/m2m100_418M](https://huggingface.co/facebook/m2m100_418M) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.6798
22
+ - Bleu: 51.7753
23
+ - Gen Len: 61.5271
24
 
25
  ## Model description
26
 
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 2e-05
43
  - train_batch_size: 4
44
+ - eval_batch_size: 3
45
  - seed: 42
46
+ - gradient_accumulation_steps: 50
47
+ - total_train_batch_size: 200
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: linear
50
+ - num_epochs: 10
51
 
52
  ### Training results
53
 
54
  | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
55
  |:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|
56
+ | 0.8981 | 1.0 | 9218 | 0.8277 | 48.5181 | 61.4701 |
57
+ | 0.8167 | 2.0 | 18437 | 0.7696 | 49.6956 | 61.4641 |
58
+ | 0.7718 | 3.0 | 27656 | 0.7396 | 50.4045 | 61.5042 |
59
+ | 0.7373 | 4.0 | 36875 | 0.7203 | 50.8134 | 61.5313 |
60
+ | 0.713 | 5.0 | 46094 | 0.7069 | 51.136 | 61.5931 |
61
+ | 0.6937 | 6.0 | 55312 | 0.6967 | 51.4274 | 61.5911 |
62
+ | 0.6787 | 7.0 | 64531 | 0.6889 | 51.5404 | 61.5012 |
63
+ | 0.6659 | 8.0 | 73750 | 0.6844 | 51.7187 | 61.5229 |
64
+ | 0.6584 | 9.0 | 82969 | 0.6809 | 51.8218 | 61.5392 |
65
+ | 0.649 | 10.0 | 92180 | 0.6798 | 51.7753 | 61.5271 |
66
 
67
 
68
  ### Framework versions