ArthurConmyGDM
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
|
5 |
-
# Gemma Scope
|
6 |
|
7 |
TODO add GIF here.
|
8 |
|
@@ -16,6 +16,38 @@ This is a landing page for **Gemma Scope**, a comprehensive, open suite of spars
|
|
16 |
- Read the Gemma Scope technical report (TODO link).
|
17 |
- Check out Mishax, an internal tool we used to help make Gemma Scope (TODO link).
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
|
5 |
+
# Gemma Scope:
|
6 |
|
7 |
TODO add GIF here.
|
8 |
|
|
|
16 |
- Read the Gemma Scope technical report (TODO link).
|
17 |
- Check out Mishax, an internal tool we used to help make Gemma Scope (TODO link).
|
18 |
|
19 |
+
# Quick start:
|
20 |
|
21 |
+
You can get started with Gemma Scope by downloading the weights from any of our repositories:
|
22 |
+
|
23 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-res
|
24 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-mlp
|
25 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-att
|
26 |
+
- https://huggingface.co/google/gemma-scope-2b-pt-transcoders
|
27 |
+
- https://huggingface.co/google/gemma-scope-9b-pt-res
|
28 |
+
- https://huggingface.co/google/gemma-scope-9b-pt-mlp
|
29 |
+
- https://huggingface.co/google/gemma-scope-9b-pt-att
|
30 |
+
- https://huggingface.co/google/gemma-scope-9b-it-res
|
31 |
+
- https://huggingface.co/google/gemma-scope-27b-pt-res
|
32 |
+
|
33 |
+
The full list of SAEs we trained at which sites and layers are linked from the following table, adapted from Figure 1 of our technical report:
|
34 |
+
|
35 |
+
| <big>Gemma 2 Model</big> | <big>SAE Width</big> | <big>Attention</big> | <big>MLP</big> | <big>Residual</big> | <big>Tokens</big> |
|
36 |
+
|---------------|-----------|-----------|-----|----------|----------|
|
37 |
+
| 2.6B PT<br>(26 layers) | 2^14 ≈ 16.4K | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp)[+](https://huggingface.co/google/gemma-scope-2b-pt-transcoders) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 4B |
|
38 |
+
| | 2^15 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_32k/)} | 8B |
|
39 |
+
| | 2^16 | [All](https://huggingface.co/google/gemma-scope-2b-pt-att) | [All](https://huggingface.co/google/gemma-scope-2b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-2b-pt-res) | 8B |
|
40 |
+
| | 2^17 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_131k/)} | 8B |
|
41 |
+
| | 2^18 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_262k/)} | 8B |
|
42 |
+
| | 2^19 | | | {[12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_524k/)} | 8B |
|
43 |
+
| | 2^20 ≈ 1M | | | {[5](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_5/width_1m/), [12](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_12/width_1m/), [19](https://huggingface.co/google/gemma-scope-2b-pt-res/tree/main/layer_19/width_1m/)} | 16B |
|
44 |
+
| 9B PT<br>(42 layers) | 2^14 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 4B |
|
45 |
+
| | 2^15 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_32k/)} | 8B |
|
46 |
+
| | 2^16 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_65k/)} | 8B |
|
47 |
+
| | 2^17 | [All](https://huggingface.co/google/gemma-scope-9b-pt-att) | [All](https://huggingface.co/google/gemma-scope-9b-pt-mlp) | [All](https://huggingface.co/google/gemma-scope-9b-pt-res) | 8B |
|
48 |
+
| | 2^18 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_262k/)} | 8B |
|
49 |
+
| | 2^19 | | | {[20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_524k/)} | 8B |
|
50 |
+
| | 2^20 | | | {[9](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_9/width_1m/), [20](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_20/width_1m/), [31](https://huggingface.co/google/gemma-scope-9b-pt-res/tree/main/layer_31/width_1m/)} | 16B |
|
51 |
+
| 27B PT<br>(46 layers) | 2^17 | | | {[10](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_10/width_131k/), [22](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_22/width_131k/), [34](https://huggingface.co/google/gemma-scope-27b-pt-res/tree/main/layer_34/width_131k/)} | 8B |
|
52 |
+
| 9B IT<br>(42 layers) | 2^14 | | | {[9](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_9/width_16k/), [20](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_20/width_16k/), [31](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_31/width_16k/)} | 4B |
|
53 |
+
| | 2^17 | | | {[9](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_9/width_131k/), [20](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_20/width_131k/), [31](https://huggingface.co/google/gemma-scope-9b-it-res/tree/main/layer_31/width_131k/)} | 8B |
|