Update README.md (#1)
Browse files- Update README.md (8d7c7d03c10e2ab6c594a3337b34aa459c458161)
README.md
CHANGED
@@ -18,7 +18,6 @@ Doge is an ongoing research project where we aim to train a series of small lang
|
|
18 |
In addition, Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by Jingze Shi, it only allows text input and text generation, for detailed algorithm and model architecture, please refer to [Wonderful Matrices](https://arxiv.org/abs/2412.11834), the ongoing research repository is [Wonderful Matrices](https://github.com/LoserCheems/WonderfulMatrices).
|
19 |
|
20 |
|
21 |
-
|
22 |
## Uses
|
23 |
|
24 |
```python
|
@@ -60,17 +59,18 @@ outputs = model.generate(
|
|
60 |
|
61 |
> TODO: The larger model is under training and will be uploaded soon.
|
62 |
|
63 |
-
|
64 |
-
|
65 |
|---|---|---|---|---|---|---|
|
66 |
-
| [Doge-20M-Instruct](https://huggingface.co/
|
|
|
67 |
|
68 |
-
|
69 |
-
**Training Environment**:
|
70 |
- Image: nvcr.io/nvidia/pytorch:24.10-py3
|
71 |
- Hardware: 1x NVIDIA RTX 4090
|
72 |
- Software: Transformers, TRL
|
73 |
|
|
|
74 |
## Citation
|
75 |
|
76 |
```bibtex
|
|
|
18 |
In addition, Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by Jingze Shi, it only allows text input and text generation, for detailed algorithm and model architecture, please refer to [Wonderful Matrices](https://arxiv.org/abs/2412.11834), the ongoing research repository is [Wonderful Matrices](https://github.com/LoserCheems/WonderfulMatrices).
|
19 |
|
20 |
|
|
|
21 |
## Uses
|
22 |
|
23 |
```python
|
|
|
59 |
|
60 |
> TODO: The larger model is under training and will be uploaded soon.
|
61 |
|
62 |
+
**Training**:
|
63 |
+
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
64 |
|---|---|---|---|---|---|---|
|
65 |
+
| [Doge-20M-Instruct](https://huggingface.co/JingzeShi/Doge-20M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 8192 | 8e-5 | 1M | bfloat16 |
|
66 |
+
| [Doge-60M-Instruct](https://huggingface.co/JingzeShi/Doge-60M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 8192 | 6e-5 | 1M | bfloat16 |
|
67 |
|
68 |
+
**Environment**:
|
|
|
69 |
- Image: nvcr.io/nvidia/pytorch:24.10-py3
|
70 |
- Hardware: 1x NVIDIA RTX 4090
|
71 |
- Software: Transformers, TRL
|
72 |
|
73 |
+
|
74 |
## Citation
|
75 |
|
76 |
```bibtex
|