crumb commited on
Commit
1fe6c41
1 Parent(s): b9189f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -5,7 +5,7 @@ language:
5
  - en
6
  ---
7
 
8
- # broken, let me reimplement and train
9
 
10
  GLORT2 (GLORT2 Low Rank Transformer Transformer) is a transformer model where every single linear layer is another smaller transformer model. I combined qkv into one operation which means one transformer instead of 3 to save on parameters, I played w using a transformer on the embeddings but it wasnt .. great, it's 768 dim 10 layers w/ 384 dim 1 layer as the replacements for linear layers (besides embed and lm head)
11
 
 
5
  - en
6
  ---
7
 
8
+ # broken bc of updates to transformers library, let me reimplement and train
9
 
10
  GLORT2 (GLORT2 Low Rank Transformer Transformer) is a transformer model where every single linear layer is another smaller transformer model. I combined qkv into one operation which means one transformer instead of 3 to save on parameters, I played w using a transformer on the embeddings but it wasnt .. great, it's 768 dim 10 layers w/ 384 dim 1 layer as the replacements for linear layers (besides embed and lm head)
11