File size: 1,649 Bytes
fd95cfe 70d252c 11d1ab0 70d252c 825ac58 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c fd95cfe 70d252c 11d1ab0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
library_name: custom
tags:
- robotics
- diffusion
- mixture-of-experts
- multi-modal
license: mit
datasets:
- CALVIN
languages:
- en
pipeline_tag: robotics
---
# MoDE (Mixture of Denoising Experts) Diffusion Policy
## Model Description
This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
## Model Details
### Architecture
- **Base Architecture**: MoDE with custom Mixture of Experts Transformer
- **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
- **EMA**: Enabled
- **Action Window Size**: 10
- **Sampling Steps**: 5 (optimal for performance)
- **Sampler Type**: DDIM
### Input/Output Specifications
#### Inputs
- RGB Static Camera: `(B, T, 3, H, W)` tensor
- RGB Gripper Camera: `(B, T, 3, H, W)` tensor
- Language Instructions: Text strings
#### Outputs
- Action Space: `(B, T, 7)` tensor representing delta EEF actions
## Usage
```python
obs = {
"rgb_obs": {
"rgb_static": static_image,
"rgb_gripper": gripper_image
}
}
goal = {"lang_text": "pick up the blue cube"}
action = model.step(obs, goal)
```
## Training Details
### Configuration
- **Optimizer**: AdamW
- **Learning Rate**: {config.optimizer.learning_rate}
- **Weight Decay**: {config.optimizer.transformer_weight_decay}
## License
This model is released under the MIT license. |