SAELens
ArthurConmyGDM commited on
Commit
a6a8623
·
verified ·
1 Parent(s): 0ae72a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -2,7 +2,7 @@
2
  license: cc-by-nc-4.0
3
  ---
4
 
5
- # Gemma Scope
6
 
7
  TODO add GIF here.
8
 
@@ -16,6 +16,38 @@ This is a landing page for **Gemma Scope**, a comprehensive, open suite of spars
16
  - Read the Gemma Scope technical report (TODO link).
17
  - Check out Mishax, an internal tool we used to help make Gemma Scope (TODO link).
18
 
19
- Download Gemma Scope here:
20
 
21
- TODO make table
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-nc-4.0
3
  ---
4
 
5
+ # Gemma Scope:
6
 
7
  TODO add GIF here.
8
 
 
16
  - Read the Gemma Scope technical report (TODO link).
17
  - Check out Mishax, an internal tool we used to help make Gemma Scope (TODO link).
18
 
19
+ # Quick start:
20
 
21
+ You can get started with Gemma Scope by downloading the weights from any of our repositories:
22
+
23
+ - https://huggingface.co/google/gemma-scope-2b-pt-res
24
+ - https://huggingface.co/google/gemma-scope-2b-pt-mlp
25
+ - https://huggingface.co/google/gemma-scope-2b-pt-att
26
+ - https://huggingface.co/google/gemma-scope-2b-pt-transcoders
27
+ - https://huggingface.co/google/gemma-scope-9b-pt-res
28
+ - https://huggingface.co/google/gemma-scope-9b-pt-mlp
29
+ - https://huggingface.co/google/gemma-scope-9b-pt-att
30
+ - https://huggingface.co/google/gemma-scope-9b-it-res
31
+ - https://huggingface.co/google/gemma-scope-27b-pt-res
32
+
33
+ The full list of SAEs we trained at which sites and layers are linked from the following table, adapted from Figure 1 of our technical report:
34
+
35
+ | <big>Gemma 2 Model</big> | <big>SAE Width</big> | <big>Attention</big> | <big>MLP</big> | <big>Residual</big> | <big>Tokens</big> |
36
+ |---------------|-----------|-----------|-----|----------|----------|
37
+ | 2.6B PT<br>(26 layers) | 2^14 ≈ 16.4K | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp)[+](https://huggingface.co/google/gemma-scope-2b-pt-transcoders) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 4B |
38
+ | | 2^15 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_32k/)} | 8B |
39
+ | | 2^16 | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 8B |
40
+ | | 2^17 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_131k/)} | 8B |
41
+ | | 2^18 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_262k/)} | 8B |
42
+ | | 2^19 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_524k/)} | 8B |
43
+ | | 2^20 ≈ 1M | | | {[5](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_5/width_1m/), [12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_1m/), [19](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_19/width_1m/)} | 16B |
44
+ | 9B PT<br>(42 layers) | 2^14 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 4B |
45
+ | | 2^15 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_32k/)} | 8B |
46
+ | | 2^16 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_65k/)} | 8B |
47
+ | | 2^17 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 8B |
48
+ | | 2^18 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_262k/)} | 8B |
49
+ | | 2^19 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_524k/)} | 8B |
50
+ | | 2^20 | | | {[9](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_9/width_1m/), [20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_1m/), [31](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_31/width_1m/)} | 16B |
51
+ | 27B PT<br>(46 layers) | 2^17 | | | {[10](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_10/width_131k/), [22](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_22/width_131k/), [34](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_34/width_131k/)} | 8B |
52
+ | 9B IT<br>(42 layers) | 2^14 | | | {[9](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_9/width_16k/), [20](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_20/width_16k/), [31](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_31/width_16k/)} | 4B |
53
+ | | 2^17 | | | {[9](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_9/width_131k/), [20](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_20/width_131k/), [31](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_31/width_131k/)} | 8B |