This is a GPT-2 model trained in llm.c for 330K steps (of 1M batch size) on FineWeb-EDU.

A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/677 .

This model has a bit of a complicated history. I wanted to train it for 400K steps, i.e. (-x 400000), but it became unstable later in training and exploded around step 330K. Because I was losing my computing quota shortly, I decided to just rewind back to checkpoint 300K, and then instead of going all the way to 400K I started annealing linearly down to 330K. This went without incident and produced this model.

This is the longest I've trained a GPT-2 model for, and it reaches HellaSwag of 62.7 by the end.

Downloads last month
323
Safetensors
Model size
1.56B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for karpathy/gpt2_1558M_final4_hf

Adapters
1 model
Quantizations
1 model