Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Paper • 2206.04615 • Published Jun 9, 2022 • 5
Emb-GAM: an Interpretable and Efficient Predictor using Pre-trained Language Models Paper • 2209.11799 • Published Sep 23, 2022
Interpretations are useful: penalizing explanations to align neural networks with prior knowledge Paper • 1909.13584 • Published Sep 30, 2019
Rethinking Interpretability in the Era of Large Language Models Paper • 2402.01761 • Published Jan 30, 2024 • 23
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Paper • 2311.02262 • Published Nov 3, 2023 • 10
Tree Prompting: Efficient Task Adaptation without Fine-Tuning Paper • 2310.14034 • Published Oct 21, 2023 • 2
Explaining black box text modules in natural language with language models Paper • 2305.09863 • Published May 17, 2023 • 2
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation Paper • 2112.02721 • Published Dec 6, 2021