This respository contains ~45 folders. Each folder contains a transformer model that can predict addition questions, subtraction questions or both.

The folder name (e.g. sub_d6_l2_h3_t20K_s173289) contains:

  • "add", "sub", or "mix": Shows the types of questions the model can predict.
  • "d5" to "d20": How many digits the model handles e.g. a d5 sub model can predict the answer in 123450-345670=-0123230
  • "l1", "l2" or "l3": The number of layers in the model
  • "h3" or "h4": The number of attention heads in the model
  • "t15K" to "t85K" etc: The number of batches the model was trained on
  • "s372001" etc: The random seed used in model training

Some folder names also contain:

  • "ins1": Before training the model was initialized with a smaller, accurate addition model
  • "ins2": As per ins1, but the inserted, useful attention heads were not allowed to change
  • "ins3": As per ins2, but the inserted MLP layers were also not allowed to change

Each folder contains:

  • model.pth: The transformer model as described above
  • training_loss.json: Data gathered during model training. Used to plot "loss over training batches" graphs
  • behaviors.json: Facts gathered about the behavior of the model by direct inspection. Includes attention pattern data, PCA data, answer digit impact data, etc.
  • features.json: Facts gathered about hypothesised algorithm features via experimentation e.g. node P12L0H1 implements the feature A3.ST.

The first 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsTrain.ipynb notebook. The last 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAnalyse.ipynb notebook. The json file are used by the algorithm testing notebook https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/QuantaMathsAlgorithm.ipynb notebook.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.