Update README.md
Browse files
README.md
CHANGED
@@ -38,8 +38,9 @@ In addition, Doge uses Inner Function Attention with Dynamic Mask as sequence tr
|
|
38 |
|
39 |
|| Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
|
40 |
|---|---|---|---|---|---|---|---|
|
41 |
-
| Doge-22M | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 5k | 2048 | 1B | 8e-4 | 0.25M | bfloat16 |
|
42 |
-
| Doge-76M | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 10k | 2048 | 5B | 6e-4 | 0.5M | bfloat16 |
|
|
|
43 |
|
44 |
|
45 |
**Training Environment**:
|
@@ -47,18 +48,6 @@ In addition, Doge uses Inner Function Attention with Dynamic Mask as sequence tr
|
|
47 |
- Hardware: 1x NVIDIA RTX 4090
|
48 |
- Software: Transformers
|
49 |
|
50 |
-
|
51 |
-
**Evaluation Results**:
|
52 |
-
|
53 |
-
| Model | MMLU | TriviaQA | ARC | PIQA | Hellaswag | OBQA | Wnogrande | Avg |
|
54 |
-
|-------|------|----------|-----|------|-----------|------|-----------|-----|
|
55 |
-
| TinyStories-28M | 24.03 | 0.01 | 27.69 | 53.21 | 27.32 | 21.00 | 50.67 | 29.13 |
|
56 |
-
| **Doge-22M** | 23.11 | 0.00 | 31.77 | 53.10 | 25.29 | 24.40 | 49.56 | 29.60 |
|
57 |
-
| **Doge-76M** | 23.26 | 0.05 | 37.16 | 56.31 | 27.68 | 27.00 | 49.64 | 31.58 |
|
58 |
-
| GPT2-137M | 26.29 | 0.49 | 31.09 | 62.51 | 29.76 | 29.40 | 49.72 | 32.75 |
|
59 |
-
| Pythia-160M | 26.68 | 0.34 | 31.92 | 61.64 | 29.55 | 27.80 | 49.49 | 32.49 |
|
60 |
-
| SmolLM-135M | 30.23 | 4.11 | 43.99 | 69.60 | 42.30 | 33.60 | 52.70 | 39.50 |
|
61 |
-
|
62 |
## Citation
|
63 |
|
64 |
```bibtex
|
|
|
38 |
|
39 |
|| Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
|
40 |
|---|---|---|---|---|---|---|---|
|
41 |
+
| [Doge-22M](https://huggingface.co/LoserCheems/Doge-22M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 5k | 2048 | 1B | 8e-4 | 0.25M | bfloat16 |
|
42 |
+
| [Doge-76M](https://huggingface.co/JingzeShi/Doge-76M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 10k | 2048 | 5B | 6e-4 | 0.5M | bfloat16 |
|
43 |
+
| [Doge-197M](https://huggingface.co/JingzeShi/Doge-197M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 20k | 2048 | 20B | 5e-4 | 1M | bfloat16 |
|
44 |
|
45 |
|
46 |
**Training Environment**:
|
|
|
48 |
- Hardware: 1x NVIDIA RTX 4090
|
49 |
- Software: Transformers
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
## Citation
|
52 |
|
53 |
```bibtex
|