---
library_name: custom
tags:
- robotics
- diffusion
- mixture-of-experts
- multi-modal
license: mit
datasets:
- CALVIN
language:
- en
pipeline_tag: robotics
---
# MoDE (Mixture 1of Diffusion Experts) Model
This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based processing with expert routing and diffusion-based action prediction.
## Model Architecture
- Base Architecture: MoDE with custom Mixture of Experts Transformer
- Vision Encoder: {getattr(model_instance, 'resnet_type', 'ResNet')} with FiLM conditioning
- EMA: Enabled
- Action Window Size: {model_instance.act_window_size}
- Sampling Steps: {model_instance.num_sampling_steps}
- Sampler Type: {model_instance.sampler_type}
## Input/Output Specifications
- RGB Static Camera: (B, T, 3, H, W) tensor
- RGB Gripper Camera: (B, T, 3, H, W) tensor
- Language Instructions: Text strings
- Output: (B, T, 7) tensor representing 7-DoF actions
## Usage Example
```python
from huggingface_hub import hf_hub_download
import torch
weights_path = hf_hub_download(repo_id="{repo_name}", filename="model_cleaned.safetensors")
model.load_pretrained_parameters(weights_path)
obs = {
"rgb_obs": {
"rgb_static": static_image,
"rgb_gripper": gripper_image
}
}
goal = {"lang_text": "pick up the blue cube"}
action = model.step(obs, goal)
```
## Training Configuration
- Optimizer: AdamW
- Learning Rate: {config.optimizer.learning_rate}
- Weight Decay: {config.optimizer.transformer_weight_decay}