GPT-NeoX trained on MiniPile, for a baseline to compare my MANN models against. Uses NeelNanda/gpt-neox-tokenizer-digits for tokenization.

The exact model configuration is as follows:

cfg = GPTNeoXConfig(
    vocab_size = len(tokenizer),
    hidden_size = 768,
    intermediate_size = 768*4,
    num_hidden_layers = 12,
    num_attention_heads = 12,
    tie_word_embeddings = True,
    hidden_act = "gelu_new",
    tokenizer = "NeelNanda/gpt-neox-tokenizer-digits"
)

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	25.1
ARC (25-shot)	20.73
HellaSwag (10-shot)	27.03
MMLU (5-shot)	25.31
TruthfulQA (0-shot)	49.19
Winogrande (5-shot)	52.33
GSM8K (5-shot)	0.0
DROP (3-shot)	1.09

euclaise
/

gpt-neox-122m-minipile-digits

Open LLM Leaderboard Evaluation Results

Dataset used to train euclaise/gpt-neox-122m-minipile-digits

Spaces using euclaise/gpt-neox-122m-minipile-digits 25