Model Card for gpt2-124M-edu-fineweb-10B

A 124M parameter GPT2 model trained with the 10B fineweb-edu dataset using https://github.com/karpathy/llm.c

Training took 20 hours on a single 4090 GPU (limited to 350W), giving the following graphs:

gpt2-124M-edu-fineweb-10B

Training

The training parameters where:

./train_gpt2cu \
    -i "dev/data/edu_fineweb10B/edu_fineweb_train_*.bin" \
    -j "dev/data/edu_fineweb10B/edu_fineweb_val_*.bin" \
    -o log124M \
    -e "d12" \
    -b 56 -t 1024 \
    -d 458752 \
    -r 1 \
    -z 1 \
    -c 0.1 \
    -l 0.002 \
    -q 0.0 \
    -u 700 \
    -n 5000 \
    -v 250 -s 20000 \
    -h 1

The model has had no further finetuning.

Evaluation

Evals using Eleuther AI Harness as described in the open_llm_leaderboard and comparing with those published for openai-community/gpt2

gpt2-124M-edu-fineweb-10B Evals

Eval Test Score
arc_challenge (25 shot) 24.49
gsm8k (5 shot) 0.08
hellaswag (10 shot) 32.64
mmlu (5 shot) 26.06
truthfulqa (0 shot) 42.45
winogrande (5 shot) 52.17
Overall Score 29.65
Downloads last month
128
Safetensors
Model size
124M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for rhysjones/gpt2-124M-edu-fineweb-10B

Quantizations
1 model

Dataset used to train rhysjones/gpt2-124M-edu-fineweb-10B