Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Paper • 2408.00113 • Published Jul 31, 2024 • 6
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 34
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task Paper • 2402.11917 • Published Feb 19, 2024