stabilityai
/

stablecode-completion-alpha-3b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

reshinthadith commited on Aug 8, 2023

Commit

46ef235

·

1 Parent(s): 004f712

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -66,7 +66,7 @@ The first pre-training stage relies on 300B tokens sourced from various top prog
 ### Training Procedure
-The model is pre-trained on the dataset mixes mentioned above in mixed-precision BF16), optimized with AdamW, and trained using the NeoX tokenizer with a vocabulary size of 49k.
 * **Software**: We use a fork of gpt-neox ([EleutherAI, 2021](https://github.com/EleutherAI/gpt-neox)) and train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 ([Rajbhandari et al., 2019](https://arxiv.org/abs/1910.02054v3)) and rely on flash-attention as well as rotary embedding kernels from FlashAttention-2 ([Dao et al., 2023](https://tridao.me/publications/flash2/flash2.pdf))

 ### Training Procedure
+The model is pre-trained on the dataset mixes mentioned above in mixed-precision BF16), optimized with AdamW, and trained using the StarCoder tokenizer with a vocabulary size of 49k.
 * **Software**: We use a fork of gpt-neox ([EleutherAI, 2021](https://github.com/EleutherAI/gpt-neox)) and train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 ([Rajbhandari et al., 2019](https://arxiv.org/abs/1910.02054v3)) and rely on flash-attention as well as rotary embedding kernels from FlashAttention-2 ([Dao et al., 2023](https://tridao.me/publications/flash2/flash2.pdf))