nilq commited on
Commit
524aad8
·
verified ·
1 Parent(s): 5be0acc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -10
README.md CHANGED
@@ -20,11 +20,9 @@ model-index:
20
  value: 0.5792084706530948
21
  ---
22
 
23
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
- should probably proofread and complete it, then remove this comment. -->
25
-
26
  # mistral-1L-tiny
27
 
 
28
  This model is trained on the roneneldan/TinyStories dataset.
29
  It achieves the following results on the evaluation set:
30
  - Loss: 1.6868
@@ -32,18 +30,17 @@ It achieves the following results on the evaluation set:
32
 
33
  ## Model description
34
 
35
- More information needed
 
36
 
37
  ## Intended uses & limitations
38
 
39
- More information needed
40
-
41
- ## Training and evaluation data
42
-
43
- More information needed
44
 
45
  ## Training procedure
46
 
 
 
47
  ### Training hyperparameters
48
 
49
  The following hyperparameters were used during training:
@@ -57,7 +54,7 @@ The following hyperparameters were used during training:
57
 
58
  ### Training results
59
 
60
-
61
 
62
  ### Framework versions
63
 
 
20
  value: 0.5792084706530948
21
  ---
22
 
 
 
 
23
  # mistral-1L-tiny
24
 
25
+ A tiny single-layer 35.1M parameter Mistral model, with a hidden size of 512, and an MLP intermediate size of 1024.
26
  This model is trained on the roneneldan/TinyStories dataset.
27
  It achieves the following results on the evaluation set:
28
  - Loss: 1.6868
 
30
 
31
  ## Model description
32
 
33
+ This work is inspired by the 21M parameter one-layer GPT-Neo of the [Tiny Stories paper](https://arxiv.org/abs/2305.07759).
34
+ Results reproduced to acquire high-frequency checkpoints for further analysis.
35
 
36
  ## Intended uses & limitations
37
 
38
+ Analysis of feature dynamics and emergence in real-world language models.
 
 
 
 
39
 
40
  ## Training procedure
41
 
42
+ Trained for 90171 steps, corresponding to ~2 hours on a single H100.
43
+
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
 
54
 
55
  ### Training results
56
 
57
+ Quite consistent English text generation.
58
 
59
  ### Framework versions
60