architext
/

gptj-162M

Text Generation

Inference Endpoints

Model card Files Files and versions Community

THEODOROS commited on Sep 16, 2022

Commit

e2e3cdc

•

1 Parent(s): b8a5289

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ Architext GPT-J-162M is a transformer model trained using Ben Wang's Mesh Transf
 The model consists of 12 layers with a model dimension of 768, and a feedforward dimension of 2048. The model dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as GPT-2/GPT-3.
-#Training data
 GPT-J 162B was pre-trained on the Pile, a large-scale curated dataset created by EleutherAI. It was then finetuned on synthetically generated data that was procedurally generated using the Rhinocers/Grasshopper software suite. The model was finetuned for 1.25 billion tokens over 11,500 steps on TPU v3-8. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
 #Intended Use and Limitations

 The model consists of 12 layers with a model dimension of 768, and a feedforward dimension of 2048. The model dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as GPT-2/GPT-3.
+# Training data
 GPT-J 162B was pre-trained on the Pile, a large-scale curated dataset created by EleutherAI. It was then finetuned on synthetically generated data that was procedurally generated using the Rhinocers/Grasshopper software suite. The model was finetuned for 1.25 billion tokens over 11,500 steps on TPU v3-8. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
 #Intended Use and Limitations