mbreuss commited on
Commit
70d252c
1 Parent(s): fd95cfe

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +48 -48
README.md CHANGED
@@ -1,64 +1,64 @@
1
  ---
2
- library_name: custom
3
- tags:
4
- - robotics
5
- - diffusion
6
- - mixture-of-experts
7
- - multi-modal
8
- license: mit
9
- datasets:
10
- - CALVIN
11
- languages:
12
- - en
13
- pipeline_tag: robotics
14
  ---
15
- # MoDE (Mixture of Denoising Experts) Diffusion Policy
16
 
17
- ## Model Description
18
 
19
- This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
20
 
21
- The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
22
 
23
- ## Model Details
24
 
25
- ### Architecture
26
- - **Base Architecture**: MoDE with custom Mixture of Experts Transformer
27
- - **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
28
- - **EMA**: Enabled
29
- - **Action Window Size**: 10
30
- - **Sampling Steps**: 5 (optimal for performance)
31
- - **Sampler Type**: DDIM
32
 
33
- ### Input/Output Specifications
34
 
35
- #### Inputs
36
- - RGB Static Camera: `(B, T, 3, H, W)` tensor
37
- - RGB Gripper Camera: `(B, T, 3, H, W)` tensor
38
- - Language Instructions: Text strings
39
 
40
- #### Outputs
41
- - Action Space: `(B, T, 7)` tensor representing delta EEF actions
42
 
43
- ## Usage
44
 
45
- ```python
46
- obs = {
47
- "rgb_obs": {
48
- "rgb_static": static_image,
49
- "rgb_gripper": gripper_image
50
- }
51
  }
52
- goal = {"lang_text": "pick up the blue cube"}
53
- action = model.step(obs, goal)
54
- ```
 
55
 
56
- ## Training Details
57
 
58
- ### Configuration
59
- - **Optimizer**: AdamW
60
- - **Learning Rate**: {config.optimizer.learning_rate}
61
- - **Weight Decay**: {config.optimizer.transformer_weight_decay}
62
 
63
- ## License
64
- This model is released under the MIT license.
 
1
  ---
2
+ library_name: custom
3
+ tags:
4
+ - robotics
5
+ - diffusion
6
+ - mixture-of-experts
7
+ - multi-modal
8
+ license: mit
9
+ datasets:
10
+ - CALVIN
11
+ languages:
12
+ - en
13
+ pipeline_tag: robotics
14
  ---
15
+ # MoDE (Mixture of Denoising Experts) Diffusion Policy
16
 
17
+ ## Model Description
18
 
19
+ This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based backbone with noise-only expert routing. For faster inference, we can precache the chosen expert for each timestep to reduce computation time.
20
 
21
+ The model has been pretrained on a subset of OXE for 300k steps and finetuned for downstream tasks on the CALVIN/LIBERO dataset.
22
 
23
+ ## Model Details
24
 
25
+ ### Architecture
26
+ - **Base Architecture**: MoDE with custom Mixture of Experts Transformer
27
+ - **Vision Encoder**: ResNet-50 with FiLM conditioning finetuned from ImageNet
28
+ - **EMA**: Enabled
29
+ - **Action Window Size**: 10
30
+ - **Sampling Steps**: 5 (optimal for performance)
31
+ - **Sampler Type**: DDIM
32
 
33
+ ### Input/Output Specifications
34
 
35
+ #### Inputs
36
+ - RGB Static Camera: `(B, T, 3, H, W)` tensor
37
+ - RGB Gripper Camera: `(B, T, 3, H, W)` tensor
38
+ - Language Instructions: Text strings
39
 
40
+ #### Outputs
41
+ - Action Space: `(B, T, 7)` tensor representing delta EEF actions
42
 
43
+ ## Usage
44
 
45
+ ```python
46
+ obs = {
47
+ "rgb_obs": {
48
+ "rgb_static": static_image,
49
+ "rgb_gripper": gripper_image
 
50
  }
51
+ }
52
+ goal = {"lang_text": "pick up the blue cube"}
53
+ action = model.step(obs, goal)
54
+ ```
55
 
56
+ ## Training Details
57
 
58
+ ### Configuration
59
+ - **Optimizer**: AdamW
60
+ - **Learning Rate**: {config.optimizer.learning_rate}
61
+ - **Weight Decay**: {config.optimizer.transformer_weight_decay}
62
 
63
+ ## License
64
+ This model is released under the MIT license.