pszemraj
/

MiniLMv2-L6-H384_R-fineweb-100k

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on May 4

Commit

c9099af

•

1 Parent(s): 38e2c9d

Update README.md

Files changed (1) hide show

README.md +11 -17

README.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 language:
 - en
-base_model: pszemraj/MiniLMv2-L6-H384_R-simplewiki
 tags:
 - generated_from_trainer
 metrics:
@@ -11,28 +10,23 @@ datasets:
 - BEE-spoke-data/fineweb-100k_en-med
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# MiniLMv2-L6-H384_R-simplewiki-fineweb-100k_en-med_512-vN
-This model is a fine-tuned version of [pszemraj/MiniLMv2-L6-H384_R-simplewiki](https://huggingface.co/pszemraj/MiniLMv2-L6-H384_R-simplewiki) on the BEE-spoke-data/fineweb-100k_en-med dataset.
-It achieves the following results on the evaluation set:
-- Loss: 4.0206
-- Accuracy: 0.3783
-- Num Input Tokens Seen: 162790400
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure

 ---
 language:
 - en
 tags:
 - generated_from_trainer
 metrics:
 - BEE-spoke-data/fineweb-100k_en-med
 ---
+# MiniLMv2-L6-H384_R-fineweb-100k
+This is a MiniLMv2 model pretrained further on an MLM task with the goal improving downstream finetuning/performance:
+- activation updated to SiLU prior to further training
+- MLM @ 40% mask ratio
+-
+## Model description
+This model is a fine-tuned version of [nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large) on the BEE-spoke-data/fineweb-100k_en-med dataset.
+It achieves the following results on the evaluation set:
+- Loss: 4.0206
+- Accuracy: 0.3783
+- Num Input Tokens Seen: 162790400
 ## Training procedure