JingzeShi commited on
Commit
24a7b6a
1 Parent(s): 87c6f1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -14
README.md CHANGED
@@ -38,8 +38,9 @@ In addition, Doge uses Inner Function Attention with Dynamic Mask as sequence tr
38
 
39
  || Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
40
  |---|---|---|---|---|---|---|---|
41
- | Doge-22M | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 5k | 2048 | 1B | 8e-4 | 0.25M | bfloat16 |
42
- | Doge-76M | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 10k | 2048 | 5B | 6e-4 | 0.5M | bfloat16 |
 
43
 
44
 
45
  **Training Environment**:
@@ -47,18 +48,6 @@ In addition, Doge uses Inner Function Attention with Dynamic Mask as sequence tr
47
  - Hardware: 1x NVIDIA RTX 4090
48
  - Software: Transformers
49
 
50
-
51
- **Evaluation Results**:
52
-
53
- | Model | MMLU | TriviaQA | ARC | PIQA | Hellaswag | OBQA | Wnogrande | Avg |
54
- |-------|------|----------|-----|------|-----------|------|-----------|-----|
55
- | TinyStories-28M | 24.03 | 0.01 | 27.69 | 53.21 | 27.32 | 21.00 | 50.67 | 29.13 |
56
- | **Doge-22M** | 23.11 | 0.00 | 31.77 | 53.10 | 25.29 | 24.40 | 49.56 | 29.60 |
57
- | **Doge-76M** | 23.26 | 0.05 | 37.16 | 56.31 | 27.68 | 27.00 | 49.64 | 31.58 |
58
- | GPT2-137M | 26.29 | 0.49 | 31.09 | 62.51 | 29.76 | 29.40 | 49.72 | 32.75 |
59
- | Pythia-160M | 26.68 | 0.34 | 31.92 | 61.64 | 29.55 | 27.80 | 49.49 | 32.49 |
60
- | SmolLM-135M | 30.23 | 4.11 | 43.99 | 69.60 | 42.30 | 33.60 | 52.70 | 39.50 |
61
-
62
  ## Citation
63
 
64
  ```bibtex
 
38
 
39
  || Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
40
  |---|---|---|---|---|---|---|---|
41
+ | [Doge-22M](https://huggingface.co/LoserCheems/Doge-22M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 5k | 2048 | 1B | 8e-4 | 0.25M | bfloat16 |
42
+ | [Doge-76M](https://huggingface.co/JingzeShi/Doge-76M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 10k | 2048 | 5B | 6e-4 | 0.5M | bfloat16 |
43
+ | [Doge-197M](https://huggingface.co/JingzeShi/Doge-197M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 20k | 2048 | 20B | 5e-4 | 1M | bfloat16 |
44
 
45
 
46
  **Training Environment**:
 
48
  - Hardware: 1x NVIDIA RTX 4090
49
  - Software: Transformers
50
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ## Citation
52
 
53
  ```bibtex