Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms Paper โข 2403.17806 โข Published Mar 26 โข 3
๐ Daily Picks in Interpretability & Analysis of LMs Collection Outstanding research in interpretability and evaluation of language models, summarized โข 90 items โข Updated about 17 hours ago โข 92