JingzeShi commited on
Commit
9735776
1 Parent(s): bb37c56

Update README.md (#1)

Browse files

- Update README.md (966005eb11ac4e6532750f1b31bcff51c563a6b0)

Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: text-generation
9
  ---
10
 
11
 
12
- # **Doge 20M**
13
 
14
  Doge is an ongoing research project where we aim to train a series of small language models to further explore whether the Transformer framework allows for more complex feedforward network structures, enabling the model to have fewer cache states and larger knowledge capacity.
15
 
@@ -21,8 +21,8 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
21
  ```python
22
  >>> from transformers import AutoTokenizer, AutoModelForCausalLM
23
 
24
- >>> tokenizer = AutoTokenizer.from_pretrained("JingzeShi/Doge-20M")
25
- >>> model = AutoModelForCausalLM.from_pretrained("JingzeShi/Doge-20M", trust_remote_code=True)
26
  >>> inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
27
 
28
  >>> out = model.generate(**inputs, max_new_tokens=100)
@@ -39,14 +39,14 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
39
  **Training**:
40
  | Model | Training Data | Epochs | Steps | Content Length | Tokens | LR | Batch Size | Precision |
41
  |---|---|---|---|---|---|---|---|---|
42
- | [Doge-20M](https://huggingface.co/LoserCheems/Doge-20M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 2 | 10k | 2048 | 5B | 8e-4 | 0.25M | bfloat16 |
43
- | [Doge-60M](https://huggingface.co/LoserCheems/Doge-60M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 2 | 20k | 2048 | 20B | 6e-4 | 0.5M | bfloat16 |
44
 
45
  **Evaluation**:
46
  | Model | TriviaQA | MMLU | ARC | PIQA | HellaSwag | OBQA | Winogrande |
47
  |---|---|---|---|---|---|---|---|
48
- | [Doge-20M](https://huggingface.co/LoserCheems/Doge-20M) | - | 26.01 | 36.15 | 56.26 | 26.60 | 26.60 | 50.12 |
49
- | [Doge-60M](https://huggingface.co/LoserCheems/Doge-60M) | - | 25.81 | 45.49 | 61.37 | 29.65 | 27.40 | 52.57 |
50
 
51
  **Environment**:
52
  - Image: nvcr.io/nvidia/pytorch:24.10-py3
 
9
  ---
10
 
11
 
12
+ # **Doge 60M**
13
 
14
  Doge is an ongoing research project where we aim to train a series of small language models to further explore whether the Transformer framework allows for more complex feedforward network structures, enabling the model to have fewer cache states and larger knowledge capacity.
15
 
 
21
  ```python
22
  >>> from transformers import AutoTokenizer, AutoModelForCausalLM
23
 
24
+ >>> tokenizer = AutoTokenizer.from_pretrained("JingzeShi/Doge-60M")
25
+ >>> model = AutoModelForCausalLM.from_pretrained("JingzeShi/Doge-60M", trust_remote_code=True)
26
  >>> inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
27
 
28
  >>> out = model.generate(**inputs, max_new_tokens=100)
 
39
  **Training**:
40
  | Model | Training Data | Epochs | Steps | Content Length | Tokens | LR | Batch Size | Precision |
41
  |---|---|---|---|---|---|---|---|---|
42
+ | [Doge-20M](https://huggingface.co/JingzeShi/Doge-20M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 2 | 10k | 2048 | 5B | 8e-4 | 0.25M | bfloat16 |
43
+ | [Doge-60M](https://huggingface.co/JingzeShi/Doge-60M) | [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) | 2 | 20k | 2048 | 20B | 6e-4 | 0.5M | bfloat16 |
44
 
45
  **Evaluation**:
46
  | Model | TriviaQA | MMLU | ARC | PIQA | HellaSwag | OBQA | Winogrande |
47
  |---|---|---|---|---|---|---|---|
48
+ | [Doge-20M](https://huggingface.co/JingzeShi/Doge-20M) | - | 26.01 | 36.15 | 56.26 | 26.60 | 26.60 | 50.12 |
49
+ | [Doge-60M](https://huggingface.co/JingzeShi/Doge-60M) | - | 25.81 | 45.49 | 61.37 | 29.65 | 27.40 | 52.57 |
50
 
51
  **Environment**:
52
  - Image: nvcr.io/nvidia/pytorch:24.10-py3