|
--- |
|
license: mit |
|
datasets: |
|
- koutch/stackoverflow_python |
|
- Vezora/Tested-143k-Python-Alpaca |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.1-8B-Instruct |
|
--- |
|
|
|
This repository contains sparse autoencoders trained to analyze the internal representations of the Llama 3.1 8B Instruct model. The autoencoders are trained on the residual stream activations when processing code-related instruction data. |
|
|
|
We apply these specialized, lightweight SAEs on a coding task in our blog post [Sieve](https://tilderesearch.com/blog/sieve). |
|
|
|
## Model Details |
|
|
|
- **Model Type:** TopK Sparse Autoencoder |
|
- **Base Model:** Llama 3.1 8B Instruct |
|
- **Training Data:** 1B tokens of code data from: |
|
- StackOverflow Python dataset |
|
- Tested-143k Python Alpaca dataset |
|
- **Architecture:** Linear encoder-decoder with ReLU and TopK activation (k=64, 512) |
|
- **File Format:** PyTorch .pt files containing: |
|
- W_enc_DF: Encoder weight matrix |
|
- b_enc_F: Encoder bias vector |
|
- W_dec_FD: Decoder weight matrix |
|
- b_dec_D: Decoder bias vector |
|
|
|
## Usage |
|
|
|
The autoencoders can be used to analyze and interpret the internal representations formed by Llama 3.1 8B Instruct when processing code. Since these autoencoders are trained on a very specific sub data mixture, they are not recommended for general purpose. |
|
They can be used to reproduce the result of Sieve evaluation for Llama 3.1 8B Instruct. |
|
|
|
Example usage can be found in the [Sieve repo](https://github.com/tilde-research/sieve) |
|
|
|
## Training Details |
|
|
|
- **Training Data Size:** 1B tokens |
|
- **Domain:** Python code and code-related instructions |
|
- **Target:** Residual stream activations from Llama 3.1 8B Instruct from layers 8, 10, and 12 |
|
- **Compute:** Around 9 A100 hours |
|
|
|
## License |
|
|
|
MIT |
|
|
|
## Citation |
|
|
|
If you use these models in your research, please cite: |
|
|
|
```bibtex |
|
@article{karvonen2024sieve, |
|
title={Sieve: SAEs Beat Baselines on a Real-World Task (A Code Generation Case Study)}, |
|
author={Karvonen, Adam and Pai, Dhruv and Wang, Mason and Keigwin, Ben}, |
|
journal={Tilde Research Blog}, |
|
year={2024}, |
|
month={12}, |
|
url={https://www.tilderesearch.com/blog/sieve}, |
|
note={Blog post} |
|
} |