SAELens
gemma-scope / README.md
ArthurConmyGDM's picture
Update README.md
adbe8ea verified
metadata
license: cc-by-4.0
library_name: saelens

Gemma Scope:

This is a landing page for Gemma Scope, a comprehensive, open suite of sparse autoencoders for Gemma 2 9B and 2B. Sparse Autoencoders are a "microscope" of sorts that can help us break down a model’s internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.

There are no model weights in this repo. If you are looking for them, please visit one of our repos:

This tutorial has instructions on how to load the SAEs, and this tutorial explains and implements JumpReLU SAE training in PyTorch and JAX.

Key links:

Full weight set:

The full list of SAEs we trained at which sites and layers are linked from the following table, adapted from Figure 1 of our technical report:

Gemma 2 Model SAE Width Attention MLP Residual Tokens
2.6B PT
(26 layers)
2^14 ≈ 16.4K All All All+ 4B
2^15 {12} 8B
2^16 All All All 8B
2^17 {12} 8B
2^18 {12} 8B
2^19 {12} 8B
2^20 ≈ 1M {5, 12, 19} 16B
9B PT
(42 layers)
2^14 All All All 4B
2^15 {20} 8B
2^16 {20} 8B
2^17 All All All 8B
2^18 {20} 8B
2^19 {20} 8B
2^20 {9, 20, 31} 16B
27B PT
(46 layers)
2^17 {10, 22, 34} 8B
9B IT
(42 layers)
2^14 {9, 20, 31} 4B
2^17 {9, 20, 31} 8B

Which SAE is in the Neuronpedia demo?

https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_20/width_16k/average_l0_71

Citation

@misc{lieberum2024gemmascopeopensparse,
      title={Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2}, 
      author={Tom Lieberum and Senthooran Rajamanoharan and Arthur Conmy and Lewis Smith and Nicolas Sonnerat and Vikrant Varma and János Kramár and Anca Dragan and Rohin Shah and Neel Nanda},
      year={2024},
      eprint={2408.05147},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.05147}, 
}

Paper link: https://arxiv.org/abs/2408.05147