mbreuss
/

MoDE_CALVIN_ABC_2

Robotics

custom

diffusion

mixture-of-experts

multi-modal

Model card Files Files and versions Community

mbreuss commited on Dec 16, 2024

Commit

fd95cfe

verified ·

1 Parent(s): 854a554

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +63 -55
model_cleaned.safetensors +1 -1

README.md CHANGED Viewed

@@ -1,56 +1,64 @@
-            ---
-            library_name: custom
-            tags:
-            - robotics
-            - diffusion
-            - mixture-of-experts
-            - multi-modal
-            license: mit
-            datasets:
-            - CALVIN
-            language:
-            - en
-            pipeline_tag: robotics
-            ---
-            # MoDE (Mixture 1of Diffusion Experts) Model
-            This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based processing with expert routing and diffusion-based action prediction.
-            ## Model Architecture
-            - Base Architecture: MoDE with custom Mixture of Experts Transformer
-            - Vision Encoder: {getattr(model_instance, 'resnet_type', 'ResNet')} with FiLM conditioning
-            - EMA: Enabled
-            - Action Window Size: {model_instance.act_window_size}
-            - Sampling Steps: {model_instance.num_sampling_steps}
-            - Sampler Type: {model_instance.sampler_type}
-            ## Input/Output Specifications
-            - RGB Static Camera: (B, T, 3, H, W) tensor
-            - RGB Gripper Camera: (B, T, 3, H, W) tensor
-            - Language Instructions: Text strings
-            - Output: (B, T, 7) tensor representing 7-DoF actions
-            ## Usage Example
-            ```python
-            from huggingface_hub import hf_hub_download
-            import torch
-            weights_path = hf_hub_download(repo_id="{repo_name}", filename="model_cleaned.safetensors")
-            model.load_pretrained_parameters(weights_path)
-            obs = {
-                "rgb_obs": {
-                    "rgb_static": static_image,
-                    "rgb_gripper": gripper_image
-                }
-            }
-            goal = {"lang_text": "pick up the blue cube"}
-            action = model.step(obs, goal)
-            ```
-            ## Training Configuration
-            - Optimizer: AdamW
-            - Learning Rate: {config.optimizer.learning_rate}
-            - Weight Decay: {config.optimizer.transformer_weight_decay}

+---
+                library_name: custom
+                tags:
+                - robotics
+                - diffusion
+                - mixture-of-experts
+                - multi-modal
+                license: mit
+                datasets:
+                - CALVIN
+                languages:
+                - en
+                pipeline_tag: robotics
+                ---
+    # MoDE (Mixture of Denoising Experts) Diffusion Policy
+    ## Model Description
+    This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
+    The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
+    ## Model Details
+    ### Architecture
+    - **Base Architecture**: MoDE with custom Mixture of Experts Transformer
+    - **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
+    - **EMA**: Enabled
+    - **Action Window Size**: 10
+    - **Sampling Steps**: 5 (optimal for performance)
+    - **Sampler Type**: DDIM
+    ### Input/Output Specifications
+    #### Inputs
+    - RGB Static Camera: `(B, T, 3, H, W)` tensor
+    - RGB Gripper Camera: `(B, T, 3, H, W)` tensor
+    - Language Instructions: Text strings
+    #### Outputs
+    - Action Space: `(B, T, 7)` tensor representing delta EEF actions
+    ## Usage
+    ```python
+    obs = {
+        "rgb_obs": {
+            "rgb_static": static_image,
+            "rgb_gripper": gripper_image
+        }
+    }
+    goal = {"lang_text": "pick up the blue cube"}
+    action = model.step(obs, goal)
+    ```
+    ## Training Details
+    ### Configuration
+    - **Optimizer**: AdamW
+    - **Learning Rate**: {config.optimizer.learning_rate}
+    - **Weight Decay**: {config.optimizer.transformer_weight_decay}
+    ## License
+    This model is released under the MIT license.

model_cleaned.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bcf93362ced811101ff755dac7a9e85267cf76f933f4ad847edecac7be71d9a3
 size 3317019856

 version https://git-lfs.github.com/spec/v1
+oid sha256:700cf4e5f7b248d313413b5988377e32cf8ac6eef149c624ae5a9e2c10705b32
 size 3317019856