--- license: mit language: - en pipeline_tag: robotics --- # Model Card for TRANSIC Policies This modelcard is accompanied with the [paper](https://arxiv.org/abs/2405.10315) titled TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction. It includes robot policies trained in the simulation and transferred to the real world for complex and contact-rich manipulation tasks. ## Model Details ### Model Description This model repository includes three parts, 1) teacher policies trained in simulation with reinforcement learning; 2) student policies distilled from successful trajectories generated by teacher policies; and 3) residual policies learned in the real world to augment simulation policies. The first part can be found in the `rl` directory. We provide RL teacher policies for 8 different tasks. The second part can be found in the `student` directory. We provide 5 student policies corresponding to 5 skills used in assembling the square table from [FurnitureBench](https://github.com/clvrai/furniture-bench). The third part can be found in the `residual` directory. They augment those 5 simulation base policies (student policies). - **Developed by:** [Yunfan Jiang](https://yunfanj.com/) - **Model type:** [PyTorch Checkpoints](https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html) - **License:** [MIT](LICENSE) ### Model Sources - **Repositories:** [TRANSIC](https://github.com/transic-robot/transic), [TRANSIC-Envs](https://github.com/transic-robot/transic-envs) - **Paper:** [TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction](https://arxiv.org/abs/2405.10315) - **Demo:** Provided on Our [Website](https://transic-robot.github.io/) ## Uses & How to Get Started with the Model Please see our [codebase](https://github.com/transic-robot/transic) for detailed usage. ## Training Details ### Training Data We provide training data in our [🤗Hugging Face data repository](https://huggingface.co/datasets/transic-robot/data). ### Training Procedure 1. Teacher policies are first trained with reinforcement learning from scratch in simulation; 2. We then rollout teacher policies to generate successful trajectories; 3. These generated data are used to train student policies through behavior cloning. Student policies take point-cloud and proprioceptive observations and output joint actions. 4. We then deploy student policies on the real robot. A human operator monitors the execution, intervenes when necessary, and provides online correction through teleoperation. Such teleoperation data are collected. 5. We use collected correction data to learn residual policies, which then augment simulation policies for successful sim-to-real transfer. ## Evaluation Policies are evaluated in simulation and the real world. We use task success rate as the metric. ## Citation **BibTeX:** ``` @article{jiang2024transic, title = {TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction}, author = {Yunfan Jiang and Chen Wang and Ruohan Zhang and Jiajun Wu and Li Fei-Fei}, year = {2024}, journal = {arXiv preprint arXiv: Arxiv-2405.10315} } ``` ## Model Card Contact [Yunfan Jiang](https://yunfanj.com/), email: `yunfanj[at]cs[dot]stanford[dot]edu`