Edit model card

This is a one-layer base model with the LlaMA 2 architecture trained on 6B tokens of the algebraic-stack part of the Proof-pile 2 dataset.
It's output distribution is thus mostly concerned with code. The tokenizer is the LlaMA 2 one. I used the following hyper parameters:
dmodel = 512
dff = 2048
nheads = 4
nctx = 1024

For the training I used AdamW with weight decay = 0.05 and cosine annealing with 5000 warmup steps and maximum learning rate 1e-4. We used BF16 precision.

Train loss: 2.6228
Test loss: 2.7490

Downloads last month
66
Safetensors
Model size
37M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Ffohturk/Mila_1L