pszemraj commited on
Commit
c9099af
1 Parent(s): 38e2c9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -17
README.md CHANGED
@@ -1,7 +1,6 @@
1
  ---
2
  language:
3
  - en
4
- base_model: pszemraj/MiniLMv2-L6-H384_R-simplewiki
5
  tags:
6
  - generated_from_trainer
7
  metrics:
@@ -11,28 +10,23 @@ datasets:
11
  - BEE-spoke-data/fineweb-100k_en-med
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # MiniLMv2-L6-H384_R-simplewiki-fineweb-100k_en-med_512-vN
18
 
19
- This model is a fine-tuned version of [pszemraj/MiniLMv2-L6-H384_R-simplewiki](https://huggingface.co/pszemraj/MiniLMv2-L6-H384_R-simplewiki) on the BEE-spoke-data/fineweb-100k_en-med dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 4.0206
22
- - Accuracy: 0.3783
23
- - Num Input Tokens Seen: 162790400
24
-
25
- ## Model description
26
 
27
- More information needed
 
28
 
29
- ## Intended uses & limitations
30
-
31
- More information needed
32
 
33
- ## Training and evaluation data
34
 
35
- More information needed
 
 
 
36
 
37
  ## Training procedure
38
 
 
1
  ---
2
  language:
3
  - en
 
4
  tags:
5
  - generated_from_trainer
6
  metrics:
 
10
  - BEE-spoke-data/fineweb-100k_en-med
11
  ---
12
 
 
 
13
 
14
+ # MiniLMv2-L6-H384_R-fineweb-100k
15
 
16
+ This is a MiniLMv2 model pretrained further on an MLM task with the goal improving downstream finetuning/performance:
 
 
 
 
 
 
17
 
18
+ - activation updated to SiLU prior to further training
19
+ - MLM @ 40% mask ratio
20
 
21
+ -
22
+ ## Model description
 
23
 
24
+ This model is a fine-tuned version of [nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large) on the BEE-spoke-data/fineweb-100k_en-med dataset.
25
 
26
+ It achieves the following results on the evaluation set:
27
+ - Loss: 4.0206
28
+ - Accuracy: 0.3783
29
+ - Num Input Tokens Seen: 162790400
30
 
31
  ## Training procedure
32