Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest
Abstract
Massive high-quality data, both pre-training raw texts and post-training annotations, have been carefully prepared to incubate advanced large language models (LLMs). In contrast, for information extraction (IE), pre-training data, such as BIO-tagged sequences, are hard to scale up. We show that IE models can act as free riders on LLM resources by reframing next-token prediction into extraction for tokens already present in the context. Specifically, our proposed next tokens extraction (NTE) paradigm learns a versatile IE model, Cuckoo, with 102.6M extractive data converted from LLM's pre-training and post-training data. Under the few-shot setting, Cuckoo adapts effectively to traditional and complex instruction-following IE with better performance than existing pre-trained IE models. As a free rider, Cuckoo can naturally evolve with the ongoing advancements in LLM data preparation, benefiting from improvements in LLM training pipelines without additional manual effort.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Asymmetric Conflict and Synergy in Post-training for LLM-based Multilingual Machine Translation (2025)
- DarwinLM: Evolutionary Structured Pruning of Large Language Models (2025)
- UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs (2025)
- Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training (2025)
- Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training (2025)
- Control LLM: Controlled Evolution for Intelligence Retention in LLM (2025)
- Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 4
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper