Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Abstract
To truly understand vision models, we must not only interpret their learned features but also validate these interpretations through controlled experiments. Current approaches either provide interpretable features without the ability to test their causal influence, or enable model editing without interpretable controls. We present a unified framework using sparse autoencoders (SAEs) that bridges this gap, allowing us to discover human-interpretable visual features and precisely manipulate them to test hypotheses about model behavior. By applying our method to state-of-the-art vision models, we reveal key differences in the semantic abstractions learned by models with different pre-training objectives. We then demonstrate the practical usage of our framework through controlled interventions across multiple vision tasks. We show that SAEs can reliably identify and manipulate interpretable visual features without model re-training, providing a powerful tool for understanding and controlling vision model behavior. We provide code, demos and models on our project website: https://osu-nlp-group.github.io/SAE-V.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment (2025)
- SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders (2025)
- Disentangling CLIP Features for Enhanced Localized Understanding (2025)
- COMIX: Compositional Explanations using Prototypes (2025)
- Analyze Feature Flow to Enhance Interpretation and Steering in Language Models (2025)
- Steering Large Language Models with Feature Guided Activation Additions (2025)
- AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper