stereoplegic
's Collections
Knowledge distillation
updated
Democratizing Reasoning Ability: Tailored Learning from Large Language
Model
Paper
•
2310.13332
•
Published
•
15
Teaching Language Models to Self-Improve through Interactive
Demonstrations
Paper
•
2310.13522
•
Published
•
12
Self-Convinced Prompting: Few-Shot Question Answering with Repeated
Introspection
Paper
•
2310.05035
•
Published
•
1
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper
•
2310.13385
•
Published
•
11
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Paper
•
2310.11716
•
Published
•
5
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper
•
2310.13355
•
Published
•
9
Conditional Diffusion Distillation
Paper
•
2310.01407
•
Published
•
20
AutoMix: Automatically Mixing Language Models
Paper
•
2310.12963
•
Published
•
14
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
•
2310.12962
•
Published
•
14
Effective Distillation of Table-based Reasoning Ability from LLMs
Paper
•
2309.13182
•
Published
•
1
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge
Distillation in Small Models for Scientific QA
Paper
•
2308.04679
•
Published
•
1
The Consensus Game: Language Model Generation via Equilibrium Search
Paper
•
2310.09139
•
Published
•
12
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation
and Generalization
Paper
•
2310.10134
•
Published
•
1
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Paper
•
2310.08461
•
Published
•
1
Large Language Models Are Also Good Prototypical Commonsense Reasoners
Paper
•
2309.13165
•
Published
•
1
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller
Language Models
Paper
•
2310.05074
•
Published
•
1
Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model
Paper
•
2310.17653
•
Published
•
2
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
Inference with Transferable Prompt
Paper
•
2305.11186
•
Published
•
1
Self-slimmed Vision Transformer
Paper
•
2111.12624
•
Published
•
1
Commonsense Knowledge Transfer for Pre-trained Language Models
Paper
•
2306.02388
•
Published
•
1
Symbolic Knowledge Distillation: from General Language Models to
Commonsense Models
Paper
•
2110.07178
•
Published
•
1
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled
from Foundation Model
Paper
•
2306.10241
•
Published
•
1
Distilling Efficient Language-Specific Models for Cross-Lingual Transfer
Paper
•
2306.01709
•
Published
•
1
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval
Paper
•
2204.02292
•
Published
•
1
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper
•
2110.07560
•
Published
•
1
HARD: Hard Augmentations for Robust Distillation
Paper
•
2305.14890
•
Published
•
1
Transfer to a Low-Resource Language via Close Relatives: The Case Study
on Faroese
Paper
•
2304.08823
•
Published
•
1
Massively Multilingual Lexical Specialization of Multilingual
Transformers
Paper
•
2208.01018
•
Published
•
1
Robust Active Distillation
Paper
•
2210.01213
•
Published
•
1
LTD: Low Temperature Distillation for Robust Adversarial Training
Paper
•
2111.02331
•
Published
•
1
Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher
Adversarial Distillation
Paper
•
2306.16170
•
Published
•
1
Mutual Adversarial Training: Learning together is better than going
alone
Paper
•
2112.05005
•
Published
•
1
Weight Averaging Improves Knowledge Distillation under Domain Shift
Paper
•
2309.11446
•
Published
•
1
Cross-Architecture Knowledge Distillation
Paper
•
2207.05273
•
Published
•
1
Cross-Domain Ensemble Distillation for Domain Generalization
Paper
•
2211.14058
•
Published
•
1
TransKD: Transformer Knowledge Distillation for Efficient Semantic
Segmentation
Paper
•
2202.13393
•
Published
•
1
Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes
Paper
•
2305.02301
•
Published
•
3
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
123
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive
Learning for Code Generation
Paper
•
2310.18628
•
Published
•
8
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language
Modeling Likewise
Paper
•
2310.19019
•
Published
•
10
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open
Resources
Paper
•
2306.04751
•
Published
•
5
Small Language Models Improve Giants by Rewriting Their Outputs
Paper
•
2305.13514
•
Published
•
2
ICLEF: In-Context Learning with Expert Feedback for Explainable Style
Transfer
Paper
•
2309.08583
•
Published
•
1
A Survey on Model Compression for Large Language Models
Paper
•
2308.07633
•
Published
•
3
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
•
2311.00430
•
Published
•
58
Token-Scaled Logit Distillation for Ternary Weight Generative Language
Models
Paper
•
2308.06744
•
Published
•
1
Understanding and Improving Knowledge Distillation for
Quantization-Aware Training of Large Transformer Encoders
Paper
•
2211.11014
•
Published
•
1
Model compression via distillation and quantization
Paper
•
1802.05668
•
Published
•
1
Feature Affinity Assisted Knowledge Distillation and Quantization of
Deep Neural Networks on Label-Free Data
Paper
•
2302.10899
•
Published
•
1
Improving Differentiable Architecture Search via Self-Distillation
Paper
•
2302.05629
•
Published
•
1
Co-training and Co-distillation for Quality Improvement and Compression
of Language Models
Paper
•
2311.02849
•
Published
•
4
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper
•
2311.02805
•
Published
•
4
Can a student Large Language Model perform as well as it's teacher?
Paper
•
2310.02421
•
Published
•
1
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Paper
•
2310.19820
•
Published
•
1
Talking Models: Distill Pre-trained Knowledge to Downstream Models via
Interactive Communication
Paper
•
2310.03188
•
Published
•
1
A Comparative Analysis of Task-Agnostic Distillation Methods for
Compressing Transformer Language Models
Paper
•
2310.08797
•
Published
•
1
MiniLMv2: Multi-Head Self-Attention Relation Distillation for
Compressing Pretrained Transformers
Paper
•
2012.15828
•
Published
•
1
Self-Distillation for Further Pre-training of Transformers
Paper
•
2210.02871
•
Published
•
1
Class Token and Knowledge Distillation for Multi-head Self-Attention
Speaker Verification Systems
Paper
•
2111.03842
•
Published
•
1
Multi-Mode Online Knowledge Distillation for Self-Supervised Visual
Representation Learning
Paper
•
2304.06461
•
Published
•
1
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression
Paper
•
2310.15594
•
Published
•
1
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
of Pre-Trained Transformers
Paper
•
2002.10957
•
Published
•
1
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
Paper
•
2303.05668
•
Published
•
1
One-Step Knowledge Distillation and Fine-Tuning in Using Large
Pre-Trained Self-Supervised Learning Models for Speaker Verification
Paper
•
2305.17394
•
Published
•
1
BPKD: Boundary Privileged Knowledge Distillation For Semantic
Segmentation
Paper
•
2306.08075
•
Published
•
1
Prototype-guided Cross-task Knowledge Distillation for Large-scale
Models
Paper
•
2212.13180
•
Published
•
1
ProKD: An Unsupervised Prototypical Knowledge Distillation Network for
Zero-Resource Cross-Lingual Named Entity Recognition
Paper
•
2301.08855
•
Published
•
1
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech
Models
Paper
•
2305.17651
•
Published
•
1
Recycle-and-Distill: Universal Compression Strategy for
Transformer-based Speech SSL Models with Attention Map Reusing and Masking
Distillation
Paper
•
2305.11685
•
Published
•
2
Large Language Model Distillation Doesn't Need a Teacher
Paper
•
2305.14864
•
Published
•
3
One Student Knows All Experts Know: From Sparse to Dense
Paper
•
2201.10890
•
Published
•
1
BD-KD: Balancing the Divergences for Online Knowledge Distillation
Paper
•
2212.12965
•
Published
•
1
Rethinking Momentum Knowledge Distillation in Online Continual Learning
Paper
•
2309.02870
•
Published
•
1
Beyond Not-Forgetting: Continual Learning with Backward Knowledge
Transfer
Paper
•
2211.00789
•
Published
•
1
Preserving Linear Separability in Continual Learning by Backward Feature
Projection
Paper
•
2303.14595
•
Published
•
2
Big-model Driven Few-shot Continual Learning
Paper
•
2309.00862
•
Published
•
1
Augmentation with Projection: Towards an Effective and Efficient Data
Augmentation Paradigm for Distillation
Paper
•
2210.11768
•
Published
•
1
Understanding the Role of Mixup in Knowledge Distillation: An Empirical
Study
Paper
•
2211.03946
•
Published
•
1
What Makes a "Good" Data Augmentation in Knowledge Distillation -- A
Statistical Perspective
Paper
•
2012.02909
•
Published
•
1
Group channel pruning and spatial attention distilling for object
detection
Paper
•
2306.01526
•
Published
•
1
Structured Pruning Learns Compact and Accurate Models
Paper
•
2204.00408
•
Published
•
1
MPCFormer: fast, performant and private Transformer inference with MPC
Paper
•
2211.01452
•
Published
•
1
Towards Teachable Conversational Agents
Paper
•
2102.10387
•
Published
•
1
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue
State Tracking
Paper
•
2311.09758
•
Published
•
1
Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Paper
•
2206.00277
•
Published
•
1
Augmented Large Language Models with Parametric Knowledge Guiding
Paper
•
2305.04757
•
Published
•
2
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
Framework
Paper
•
2111.04130
•
Published
•
1
Answering Unseen Questions With Smaller Language Models Using Rationale
Generation and Dense Retrieval
Paper
•
2308.04711
•
Published
•
1