Haiyang-W commited on
Commit
bbf155c
·
verified ·
1 Parent(s): e25580d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
4
 
5
  The *TokenFormer* is a **fully attention-based architecture**
6
  that unifies the computations of token-token and token-parameter interactions
7
- by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://github.com/Haiyang-W/TokenFormer).
8
  It contains four models of sizes
9
  150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
10
  All 4 model sizes are trained on the exact
@@ -19,7 +19,7 @@ same data, in the exact same order.
19
  - Language: English
20
  - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
21
  for training procedure, config files, and details on how to use.
22
- [See paper](https://github.com/Haiyang-W/TokenFormer) for more evals and implementation
23
  details.
24
  - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
25
  - License: Apache 2.0
@@ -68,7 +68,7 @@ TokenFormer uses the same tokenizer as [GPT-NeoX-
68
 
69
  ## Evaluations
70
 
71
- All 16 *TokenFormer* models were evaluated using the [LM Evaluation
72
  Harness](https://github.com/EleutherAI/lm-evaluation-harness).
73
  You can run the evaluation with our [instruction](https://github.com/Haiyang-W/TokenFormer?tab=readme-ov-file#evaluations).<br>
74
  Expand the sections below to see plots of evaluation results for all
 
4
 
5
  The *TokenFormer* is a **fully attention-based architecture**
6
  that unifies the computations of token-token and token-parameter interactions
7
+ by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://arxiv.org/pdf/2410.23168).
8
  It contains four models of sizes
9
  150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
10
  All 4 model sizes are trained on the exact
 
19
  - Language: English
20
  - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
21
  for training procedure, config files, and details on how to use.
22
+ [See paper](https://arxiv.org/pdf/2410.23168) for more evals and implementation
23
  details.
24
  - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
25
  - License: Apache 2.0
 
68
 
69
  ## Evaluations
70
 
71
+ All *TokenFormer* models were evaluated using the [LM Evaluation
72
  Harness](https://github.com/EleutherAI/lm-evaluation-harness).
73
  You can run the evaluation with our [instruction](https://github.com/Haiyang-W/TokenFormer?tab=readme-ov-file#evaluations).<br>
74
  Expand the sections below to see plots of evaluation results for all