|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: robotics |
|
--- |
|
# Model Card for TRANSIC Policies |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This modelcard is accompanied with the [CoRL 2024 paper](https://arxiv.org/abs/2405.10315) titled TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction. |
|
It includes robot policies trained in the simulation and transferred to the real world for complex and contact-rich manipulation tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
This model repository includes three parts, 1) teacher policies trained in simulation with reinforcement learning; 2) student policies distilled from successful trajectories generated by teacher policies; and 3) residual policies learned in the real world to augment simulation policies. |
|
|
|
The first part can be found in the `rl` directory. We provide RL teacher policies for 8 different tasks. The second part can be found in the `student` directory. We provide 5 student policies corresponding to 5 skills used in assembling the square table from [FurnitureBench](https://github.com/clvrai/furniture-bench). The third part can be found in the `residual` directory. They augment those 5 simulation base policies (student policies). |
|
|
|
|
|
- **Developed by:** [Yunfan Jiang](https://yunfanj.com/) |
|
- **Model type:** [PyTorch Checkpoints](https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html) |
|
- **License:** [MIT](LICENSE) |
|
|
|
### Model Sources |
|
|
|
- **Repositories:** [TRANSIC](https://github.com/transic-robot/transic), [TRANSIC-Envs](https://github.com/transic-robot/transic-envs) |
|
- **Paper:** [TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction](https://arxiv.org/abs/2405.10315) |
|
- **Demo:** Provided on Our [Website](https://transic-robot.github.io/) |
|
|
|
## Uses & How to Get Started with the Model |
|
|
|
Please see our [codebase](https://github.com/transic-robot/transic) for detailed usage. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
We provide training data in our [🤗Hugging Face data repository](https://huggingface.co/datasets/transic-robot/data). |
|
|
|
### Training Procedure |
|
|
|
1. Teacher policies are first trained with reinforcement learning from scratch in simulation; |
|
2. We then rollout teacher policies to generate successful trajectories; |
|
3. These generated data are used to train student policies through behavior cloning. Student policies take point-cloud and proprioceptive observations and output joint actions. |
|
4. We then deploy student policies on the real robot. A human operator monitors the execution, intervenes when necessary, and provides online correction through teleoperation. Such teleoperation data are collected. |
|
5. We use collected correction data to learn residual policies, which then augment simulation policies for successful sim-to-real transfer. |
|
|
|
## Evaluation |
|
|
|
Policies are evaluated in simulation and the real world. We use task success rate as the metric. |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
``` |
|
@inproceedings{jiang2024transic, |
|
title = {TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction}, |
|
author = {Yunfan Jiang and Chen Wang and Ruohan Zhang and Jiajun Wu and Li Fei-Fei}, |
|
booktitle = {Conference on Robot Learning}, |
|
year = {2024} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
|
|
[Yunfan Jiang](https://yunfanj.com/), email: `yunfanj[at]cs[dot]stanford[dot]edu` |