File size: 448 Bytes
6fc683c
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14

- Code release: https://github.com/microsoft/torchscale
- March 2022: release preprint [DeepNet: Scaling Transformers to 1,000 Layers](https://arxiv.org/abs/2203.00555)

```
@article{deepnet,
  author    = {Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Dongdong Zhang and Furu Wei},
  title     = {{DeepNet}: Scaling {Transformers} to 1,000 Layers},
  journal   = {CoRR},
  volume    = {abs/2203.00555},
  year      = {2022},
}
```