ChenYi99 commited on
Commit
b3485f7
1 Parent(s): 2e054a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -3
README.md CHANGED
@@ -1,3 +1,136 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
6
+ <a href='https://github.com/TencentARC/Moto'><img src='https://img.shields.io/badge/Github-black'></a>
7
+
8
+ ## 🚀Introduction
9
+
10
+ >Recent developments in Large Language Models (LLMs) pre-trained on extensive corpora have shown significant success in various natural language processing (NLP) tasks with minimal fine-tuning.
11
+ >This success offers new promise for robotics, which has long been constrained by the high cost of action-labeled data. We ask: given the abundant video data containing interaction-related knowledge available as a rich "corpus", <b><i>can a similar generative pre-training approach be effectively applied to enhance robot learning?</i></b> The key challenge is to identify an effective representation for autoregressive pre-training that benefits robot manipulation tasks.
12
+ >Inspired by the way humans learn new skills through observing dynamic environments, we propose that effective robotic learning should emphasize motion-related knowledge, which is closely tied to low-level actions and is hardware-agnostic, facilitating the transfer of learned motions to actual robot actions.
13
+ >
14
+ >To this end, we introduce <b>Moto</b>, which converts video content into latent <b>Mo</b>tion <b>To</b>ken sequences by a Latent Motion Tokenizer, learning a bridging "language" of motion from videos in an unsupervised manner.
15
+ >We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. After pre-training, Moto-GPT demonstrates the promising ability to produce semantically interpretable motion tokens, predict plausible motion trajectories, and assess trajectory rationality through output likelihood.
16
+ >To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control. Extensive experiments show that the fine-tuned Moto-GPT exhibits superior robustness and efficiency on robot manipulation benchmarks, underscoring its effectiveness in transferring knowledge from video data to downstream visual manipulations.
17
+
18
+ ## ⚙️Quick Start
19
+
20
+ ### Installation
21
+ Clone the repo:
22
+ ```bash
23
+ git clone https://github.com/TencentARC/Moto.git
24
+ ```
25
+
26
+ Install minimal requirements for Moto training and inference:
27
+ ```bash
28
+ conda create -n moto python=3.8
29
+ conda activate moto
30
+ cd Moto
31
+ pip install -r requirements.txt
32
+ cd ..
33
+ ```
34
+
35
+
36
+ [Optional] Setup the conda environment for evaluating Moto-GPT on the [CALVIN](https://github.com/mees/calvin) benchmark:
37
+
38
+ ```bash
39
+ conda create -n moto_for_calvin python=3.8
40
+ conda activate moto_for_calvin
41
+
42
+ git clone --recurse-submodules https://github.com/mees/calvin.git
43
+ pip install setuptools==57.5.0
44
+ cd calvin
45
+ cd calvin_env; git checkout main
46
+ cd ../calvin_models
47
+ sed -i 's/pytorch-lightning==1.8.6/pytorch-lightning/g' requirements.txt
48
+ sed -i 's/torch==1.13.1/torch/g' requirements.txt
49
+ cd ..
50
+ sh ./install.sh
51
+ cd ..
52
+
53
+ sudo apt-get install -y libegl1-mesa libegl1
54
+ sudo apt-get install -y libgl1
55
+ sudo apt-get install -y libosmesa6-dev
56
+ sudo apt-get install -y patchelf
57
+
58
+ cd Moto
59
+ pip install -r requirements.txt
60
+ cd ..
61
+ ```
62
+
63
+
64
+
65
+ [Optional] Setup the conda environment for evaluating Moto-GPT on the [SIMPLER](https://github.com/simpler-env/SimplerEnv) benchmark:
66
+ ```bash
67
+ source /data/miniconda3/bin/activate
68
+ conda create -n moto_for_simpler python=3.10 -y
69
+ conda activate moto_for_simpler
70
+
71
+
72
+ git clone https://github.com/simpler-env/SimplerEnv --recurse-submodules
73
+ pip install numpy==1.24.4
74
+ cd SimplerEnv/ManiSkill2_real2sim
75
+ pip install -e .
76
+ cd SimplerEnv
77
+ pip install -e .
78
+ sudo apt install ffmpeg
79
+ pip install setuptools==58.2.0
80
+ pip install tensorflow==2.15.0
81
+ pip install -r requirements_full_install.txt
82
+ pip install tensorflow[and-cuda]==2.15.1
83
+ pip install git+https://github.com/nathanrooy/simulated-annealing
84
+ cd ..
85
+
86
+ cd Moto
87
+ pip install -r requirements.txt
88
+ cd ..
89
+ ```
90
+
91
+ ### Model Weights
92
+ We release the Latent Motion Tokenizer, the pre-traiend Moto-GPT, and the fine-tuned Moto-GPT in [Moto Hugging Face](https://huggingface.co/TencentARC/Moto).
93
+ You can download them separately and save them in corresponding directories (`latent_motion_tokenizer/checkpoints/` and `moto_gpt/checkpoints/`).
94
+
95
+ ## 💻Inference
96
+
97
+ ### Generate latent motion trajectories with the pre-trained Moto-GPT
98
+ ```bash
99
+ conda activate moto
100
+ export PROJECT_ROOT=[your path to Moto project]
101
+ cd ${PROJECT_ROOT}/scripts
102
+ nohup bash run_latent_motion_generation.sh > run_latent_motion_generation.log 2>&1 &
103
+ tail -f run_latent_motion_generation.log
104
+ ```
105
+
106
+
107
+ ### Evaluating the fine-tuned Moto-GPT on robot manipulation benchmarks
108
+
109
+ Evaluation on CALVIN
110
+ ```bash
111
+ conda activate moto_for_calvin
112
+ export PROJECT_ROOT=[your path to Moto project]
113
+ cd ${PROJECT_ROOT}/scripts
114
+ nohup bash evaluate_moto_gpt_in_calvin.sh > evaluate_moto_gpt_in_calvin.log 2>&1 &
115
+ tail -f evaluate_moto_gpt_in_calvin.log
116
+ ```
117
+
118
+ Evaluation on SIMPLER
119
+ ```bash
120
+ conda activate moto_for_simpler
121
+ export PROJECT_ROOT=[your path to Moto project]
122
+ cd ${PROJECT_ROOT}/scripts
123
+ nohup bash evaluate_moto_gpt_in_simpler.sh > evaluate_moto_gpt_in_simpler.log 2>&1 &
124
+ tail -f evaluate_moto_gpt_in_simpler.log
125
+ ```
126
+
127
+ ## 📝To Do
128
+ - [x] Release the Latent Motion Tokenizer
129
+ - [x] Release the pre-trained and fine-tuned Moto-GPT
130
+ - [x] Release the inference code
131
+ - [ ] Release the trainig code
132
+
133
+
134
+
135
+ ## 🙌Acknowledgement
136
+ This repo benefits from [Phenaki-Pytorch](https://github.com/lucidrains/phenaki-pytorch), [GR-1](https://github.com/bytedance/GR-1), [GR1-Training](https://github.com/EDiRobotics/GR1-Training). Thanks for their wonderful works!