mbreuss
/

MoDE_CALVIN_ABC_2

mixture-of-experts

Model card Files Files and versions Community

MoDE_CALVIN_ABC_2 / README.md

mbreuss's picture

Upload folder using huggingface_hub

825ac58 verified 6 days ago

|

2.11 kB


	---
	library_name: custom
	tags:
	- robotics
	- diffusion
	- mixture-of-experts
	- multi-modal
	license: mit
	datasets:
	- CALVIN
	language:
	- en
	pipeline_tag: robotics
	---
	# MoDE (Mixture 1of Diffusion Experts) Model

	This model implements a Mixture of Diffusion Experts architecture for robotic manipulation, combining transformer-based processing with expert routing and diffusion-based action prediction.

	## Model Architecture
	- Base Architecture: MoDE with custom Mixture of Experts Transformer
	- Vision Encoder: {getattr(model_instance, 'resnet_type', 'ResNet')} with FiLM conditioning
	- EMA: Enabled
	- Action Window Size: {model_instance.act_window_size}
	- Sampling Steps: {model_instance.num_sampling_steps}
	- Sampler Type: {model_instance.sampler_type}

	## Input/Output Specifications
	- RGB Static Camera: (B, T, 3, H, W) tensor
	- RGB Gripper Camera: (B, T, 3, H, W) tensor
	- Language Instructions: Text strings
	- Output: (B, T, 7) tensor representing 7-DoF actions

	## Usage Example
	```python
	from huggingface_hub import hf_hub_download
	import torch

	weights_path = hf_hub_download(repo_id="{repo_name}", filename="model_cleaned.safetensors")
	model.load_pretrained_parameters(weights_path)

	obs = {
	"rgb_obs": {
	"rgb_static": static_image,
	"rgb_gripper": gripper_image
	}
	}
	goal = {"lang_text": "pick up the blue cube"}
	action = model.step(obs, goal)
	```

	## Training Configuration
	- Optimizer: AdamW
	- Learning Rate: {config.optimizer.learning_rate}
	- Weight Decay: {config.optimizer.transformer_weight_decay}