|
|
|
--- |
|
library_name: custom |
|
tags: |
|
- robotics |
|
- diffusion |
|
- mixture-of-experts |
|
- multi-modal |
|
license: mit |
|
datasets: |
|
- CALVIN |
|
language: |
|
- en |
|
pipeline_tag: robotics |
|
--- |
|
# MoDE (Mixture 1of Diffusion Experts) Model |
|
|
|
This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based processing with expert routing and diffusion-based action prediction. |
|
|
|
## Model Architecture |
|
- Base Architecture: MoDE with custom Mixture of Experts Transformer |
|
- Vision Encoder: {getattr(model_instance, 'resnet_type', 'ResNet')} with FiLM conditioning |
|
- EMA: Enabled |
|
- Action Window Size: {model_instance.act_window_size} |
|
- Sampling Steps: {model_instance.num_sampling_steps} |
|
- Sampler Type: {model_instance.sampler_type} |
|
|
|
## Input/Output Specifications |
|
- RGB Static Camera: (B, T, 3, H, W) tensor |
|
- RGB Gripper Camera: (B, T, 3, H, W) tensor |
|
- Language Instructions: Text strings |
|
- Output: (B, T, 7) tensor representing 7-DoF actions |
|
|
|
## Usage Example |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
import torch |
|
|
|
weights_path = hf_hub_download(repo_id="{repo_name}", filename="model_cleaned.safetensors") |
|
model.load_pretrained_parameters(weights_path) |
|
|
|
obs = { |
|
"rgb_obs": { |
|
"rgb_static": static_image, |
|
"rgb_gripper": gripper_image |
|
} |
|
} |
|
goal = {"lang_text": "pick up the blue cube"} |
|
action = model.step(obs, goal) |
|
``` |
|
|
|
## Training Configuration |
|
- Optimizer: AdamW |
|
- Learning Rate: {config.optimizer.learning_rate} |
|
- Weight Decay: {config.optimizer.transformer_weight_decay} |
|
|