mbreuss
/

MoDE_CALVIN_ABC_2

Robotics

custom

diffusion

mixture-of-experts

multi-modal

Model card Files Files and versions Community

mbreuss commited on Dec 16, 2024

Commit

70d252c

verified ·

1 Parent(s): fd95cfe

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

README.md +48 -48

README.md CHANGED Viewed

@@ -1,64 +1,64 @@
 ---
-                library_name: custom
-                tags:
-                - robotics
-                - diffusion
-                - mixture-of-experts
-                - multi-modal
-                license: mit
-                datasets:
-                - CALVIN
-                languages:
-                - en
-                pipeline_tag: robotics
                 ---
-    # MoDE (Mixture of Denoising Experts) Diffusion Policy
-    ## Model Description
-    This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
-    The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
-    ## Model Details
-    ### Architecture
-    - **Base Architecture**: MoDE with custom Mixture of Experts Transformer
-    - **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
-    - **EMA**: Enabled
-    - **Action Window Size**: 10
-    - **Sampling Steps**: 5 (optimal for performance)
-    - **Sampler Type**: DDIM
-    ### Input/Output Specifications
-    #### Inputs
-    - RGB Static Camera: `(B, T, 3, H, W)` tensor
-    - RGB Gripper Camera: `(B, T, 3, H, W)` tensor
-    - Language Instructions: Text strings
-    #### Outputs
-    - Action Space: `(B, T, 7)` tensor representing delta EEF actions
-    ## Usage
-    ```python
-    obs = {
-        "rgb_obs": {
-            "rgb_static": static_image,
-            "rgb_gripper": gripper_image
-        }
     }
-    goal = {"lang_text": "pick up the blue cube"}
-    action = model.step(obs, goal)
-    ```
-    ## Training Details
-    ### Configuration
-    - **Optimizer**: AdamW
-    - **Learning Rate**: {config.optimizer.learning_rate}
-    - **Weight Decay**: {config.optimizer.transformer_weight_decay}
-    ## License
-    This model is released under the MIT license.

 ---
+library_name: custom
+tags:
+- robotics
+- diffusion
+- mixture-of-experts
+- multi-modal
+license: mit
+datasets:
+- CALVIN
+languages:
+- en
+pipeline_tag: robotics
                 ---
+# MoDE (Mixture of Denoising Experts) Diffusion Policy
+## Model Description
+This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
+The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
+## Model Details
+### Architecture
+- **Base Architecture**: MoDE with custom Mixture of Experts Transformer
+- **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
+- **EMA**: Enabled
+- **Action Window Size**: 10
+- **Sampling Steps**: 5 (optimal for performance)
+- **Sampler Type**: DDIM
+### Input/Output Specifications
+#### Inputs
+- RGB Static Camera: `(B, T, 3, H, W)` tensor
+- RGB Gripper Camera: `(B, T, 3, H, W)` tensor
+- Language Instructions: Text strings
+#### Outputs
+- Action Space: `(B, T, 7)` tensor representing delta EEF actions
+## Usage
+```python
+obs = {
+    "rgb_obs": {
+        "rgb_static": static_image,
+        "rgb_gripper": gripper_image
     }
+}
+goal = {"lang_text": "pick up the blue cube"}
+action = model.step(obs, goal)
+```
+## Training Details
+### Configuration
+- **Optimizer**: AdamW
+- **Learning Rate**: {config.optimizer.learning_rate}
+- **Weight Decay**: {config.optimizer.transformer_weight_decay}
+## License
+This model is released under the MIT license.