Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
Abstract
Diffusion Policies have become widely used in Imitation Learning, offering several appealing properties, such as generating multimodal and discontinuous behavior. As models are becoming larger to capture more complex capabilities, their computational demands increase, as shown by recent scaling laws. Therefore, continuing with the current architectures will present a computational roadblock. To address this gap, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for Imitation Learning. MoDE surpasses current state-of-the-art Transformer-based Diffusion Policies while enabling parameter-efficient scaling through sparse experts and noise-conditioned routing, reducing both active parameters by 40% and inference costs by 90% via expert caching. Our architecture combines this efficient scaling with noise-conditioned self-attention mechanism, enabling more effective denoising across different noise levels. MoDE achieves state-of-the-art performance on 134 tasks in four established imitation learning benchmarks (CALVIN and LIBERO). Notably, by pretraining MoDE on diverse robotics data, we achieve 4.01 on CALVIN ABC and 0.95 on LIBERO-90. It surpasses both CNN-based and Transformer Diffusion Policies by an average of 57% across 4 benchmarks, while using 90% fewer FLOPs and fewer active parameters compared to default Diffusion Transformer architectures. Furthermore, we conduct comprehensive ablations on MoDE's components, providing insights for designing efficient and scalable Transformer architectures for Diffusion Policies. Code and demonstrations are available at https://mbreuss.github.io/MoDE_Diffusion_Policy/.
Community
Introducing a scalable and more efficient Generalist Diffusion Transformer Policy, that achieves sota performance on LIBERO, CALVIN and SIMPLER benchmark with fast and efficient training on few GPUs.
Project Link: https://mbreuss.github.io/MoDE_Diffusion_Policy/
Code Link: https://mbreuss.github.io/MoDE_Diffusion_Policy/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Diffusion Transformer Policy (2024)
- One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation (2024)
- Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression (2024)
- Mixture of Hidden-Dimensions Transformer (2024)
- Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization (2024)
- Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation (2024)
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 8
Browse 8 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper