ReLearn: Unlearning via Learning for Large Language Models
Abstract
Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.
Community
Has the over-forgetting of large models led to "aphasia"? Our latest work, ReLearn: Unlearning via Learning for Large Language Models, brings a solution!
ReLearn adopts a forward learning strategy rather than traditional disruptive reverse optimization, effectively forgetting sensitive information while avoiding excessive suppression and eliminating the issue of generating repetitive and meaningless words.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LUNAR: LLM Unlearning via Neural Activation Redirection (2025)
- MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities (2025)
- K-ON: Stacking Knowledge On the Head Layer of Large Language Model (2025)
- How to Alleviate Catastrophic Forgetting in LLMs Finetuning? Hierarchical Layer-Wise and Element-Wise Regularization (2025)
- LM2: Large Memory Models (2025)
- From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning (2025)
- Efficient Knowledge Feeding to Language Models: A Novel Integrated Encoder-Decoder Architecture (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper