Update README.md (#1)
Browse files- Update README.md (966005eb11ac4e6532750f1b31bcff51c563a6b0)
README.md
CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: text-generation
|
|
9 |
---
|
10 |
|
11 |
|
12 |
-
# **Doge
|
13 |
|
14 |
Doge is an ongoing research project where we aim to train a series of small language models to further explore whether the Transformer framework allows for more complex feedforward network structures, enabling the model to have fewer cache states and larger knowledge capacity.
|
15 |
|
@@ -21,8 +21,8 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
|
|
21 |
```python
|
22 |
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
|
23 |
|
24 |
-
>>> tokenizer = AutoTokenizer.from_pretrained("JingzeShi/Doge-
|
25 |
-
>>> model = AutoModelForCausalLM.from_pretrained("JingzeShi/Doge-
|
26 |
>>> inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
|
27 |
|
28 |
>>> out = model.generate(**inputs, max_new_tokens=100)
|
@@ -39,14 +39,14 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
|
|
39 |
**Training**:
|
40 |
| Model | Training Data | Epochs | Steps | Content Length | Tokens | LR | Batch Size | Precision |
|
41 |
|---|---|---|---|---|---|---|---|---|
|
42 |
-
| [Doge-20M](https://huggingface.co/
|
43 |
-
| [Doge-60M](https://huggingface.co/
|
44 |
|
45 |
**Evaluation**:
|
46 |
| Model | TriviaQA | MMLU | ARC | PIQA | HellaSwag | OBQA | Winogrande |
|
47 |
|---|---|---|---|---|---|---|---|
|
48 |
-
| [Doge-20M](https://huggingface.co/
|
49 |
-
| [Doge-60M](https://huggingface.co/
|
50 |
|
51 |
**Environment**:
|
52 |
- Image: nvcr.io/nvidia/pytorch:24.10-py3
|
|
|
9 |
---
|
10 |
|
11 |
|
12 |
+
# **Doge 60M**
|
13 |
|
14 |
Doge is an ongoing research project where we aim to train a series of small language models to further explore whether the Transformer framework allows for more complex feedforward network structures, enabling the model to have fewer cache states and larger knowledge capacity.
|
15 |
|
|
|
21 |
```python
|
22 |
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
|
23 |
|
24 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("JingzeShi/Doge-60M")
|
25 |
+
>>> model = AutoModelForCausalLM.from_pretrained("JingzeShi/Doge-60M", trust_remote_code=True)
|
26 |
>>> inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
|
27 |
|
28 |
>>> out = model.generate(**inputs, max_new_tokens=100)
|
|
|
39 |
**Training**:
|
40 |
| Model | Training Data | Epochs | Steps | Content Length | Tokens | LR | Batch Size | Precision |
|
41 |
|---|---|---|---|---|---|---|---|---|
|
42 |
+
| [Doge-20M](https://huggingface.co/JingzeShi/Doge-20M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 2 | 10k | 2048 | 5B | 8e-4 | 0.25M | bfloat16 |
|
43 |
+
| [Doge-60M](https://huggingface.co/JingzeShi/Doge-60M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 2 | 20k | 2048 | 20B | 6e-4 | 0.5M | bfloat16 |
|
44 |
|
45 |
**Evaluation**:
|
46 |
| Model | TriviaQA | MMLU | ARC | PIQA | HellaSwag | OBQA | Winogrande |
|
47 |
|---|---|---|---|---|---|---|---|
|
48 |
+
| [Doge-20M](https://huggingface.co/JingzeShi/Doge-20M) | - | 26.01 | 36.15 | 56.26 | 26.60 | 26.60 | 50.12 |
|
49 |
+
| [Doge-60M](https://huggingface.co/JingzeShi/Doge-60M) | - | 25.81 | 45.49 | 61.37 | 29.65 | 27.40 | 52.57 |
|
50 |
|
51 |
**Environment**:
|
52 |
- Image: nvcr.io/nvidia/pytorch:24.10-py3
|