mlfu7 commited on
Commit
5cb0391
1 Parent(s): 25f8968

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,10 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
1
+ # In-Context Imitation Learning via Next-Token Prediction
2
+ by <a href="https://max-fu.github.io">Max (Letian) Fu*</a>, <a href="https://qingh097.github.io/">Huang Huang*</a>, <a href="https://www.linkedin.com/in/gaurav-datta/">Gaurav Datta*</a>, <a href="https://yunliangchen.github.io/">Lawrence Yunliang Chen</a>, <a href="https://autolab.berkeley.edu/people">William Chung-Ho Panitch</a>, <a href="https://fangchenliu.github.io/">Fangchen Liu</a>, <a href="https://www.research.autodesk.com/people/hui-li/">Hui Li</a>, and <a href="https://goldberg.berkeley.edu">Ken Goldberg</a> at UC Berkeley and Autodesk (*equal contribution).
3
+
4
+ [[Paper](https://openreview.net/forum?id=tFEOOH9eH0)] | [[Project Page](https://github.com/Max-Fu/icrt)] | [[Checkpoints](https://huggingface.co/mlfu7/Touch-Vision-Language-Models)] | [[Dataset](https://huggingface.co/datasets/mlfu7/Touch-Vision-Language-Dataset)] | [[Citation](#citation)]
5
+
6
+ This repo contains the checkpoints for *In-Context Imitation Learning via Next-Token Prediction*. We investigate how to bring few-shot, in-context learning capability that exists in next-token prediction models (i.e. GPT) into real-robot imitation learning policies.
7
+
8
+ In particular, we store the pre-trained vision encoder and ICRT model separately. Please find them in [encoder](crossmae_rtx/cross-mae-rtx-vitb.pth) and [ICRT](icrt_vitb_droid_pretrained/icrt_vitb_droid_pretrained.pth) separately.
9
+
10
+ Please refer to the [project page](https://github.com/Max-Fu/icrt) on installing the repo, training and inferencing the model.
crossmae_rtx/cross-mae-rtx-vitb.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2743c5a1ba4cbe296870a2f12a64d2653e7c257c7e8d25c3e2b197ace461f8fe
3
+ size 509274961
icrt_vitb_droid_pretrained/icrt_vitb_droid_pretrained.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edaf5b17fa3c61e5ff0cf3dc254c8f8573197c607a63a39c7073b6066a69a545
3
+ size 366419235
icrt_vitb_droid_pretrained/run.yaml ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!python/object:icrt.util.args.ExperimentConfig
2
+ dataset_cfg: !!python/object:icrt.util.args.DatasetConfig
3
+ action_noise: 0.0
4
+ dataset_json: config/real_expanded_4_icrt_dataset_config.json
5
+ goal_conditioned: false
6
+ non_overlapping: 32
7
+ num_repeat_traj: 2
8
+ num_weighted_steps: 30
9
+ proprio_noise: 0.005
10
+ rebalance_tasks: true
11
+ shuffle_repeat_traj: true
12
+ skip_step: false
13
+ sort_by_lang: true
14
+ task_barrier: true
15
+ task_names: null
16
+ vision_aug: true
17
+ device: cuda
18
+ dist_on_itp: false
19
+ dist_url: env://
20
+ load_config: null
21
+ local_rank: -1
22
+ logging_cfg: !!python/object:icrt.util.args.LoggingConfig
23
+ log_dir: /shared/projects/icrt/output/240604_2024
24
+ log_name: '240604_2024'
25
+ output_dir: /shared/projects/icrt/output/240604_2024
26
+ model_cfg: !!python/object:icrt.util.args.ModelConfig
27
+ policy_cfg: !!python/object:icrt.util.args.PolicyConfig
28
+ adapter_mlp_ratio: 4.0
29
+ adapter_num_heads: 8
30
+ camera_pos_emb: false
31
+ decoder_pred_head: mlp
32
+ kl_div_loss: false
33
+ llama_ckpt_dir: /home/mfu/checkpoints/llama-2/llama-2-7b
34
+ load_llama: true
35
+ lora_layer_idxs: null
36
+ lora_rank: 4
37
+ loss_w_action: 1.0
38
+ modality_pos_emb: false
39
+ multikv_attn_pool: false
40
+ no_prompt_loss: true
41
+ phase: pretrain
42
+ pred_action_only: true
43
+ pretrained_path: /shared/projects/icrt/output/240604_1852/checkpoint-3.pth
44
+ remove_proprio: false
45
+ scale_loss: 1.0
46
+ scratch_llama_config: ../config/model_config/custom_transformer.json
47
+ separate_camera_adapter: true
48
+ step_weight: 1.0
49
+ vision_encoder_cfg: !!python/object:icrt.util.args.VisionEncoderConfig
50
+ vision_encoder: /home/mfu/Documents/icrt/crossmae_ckpt/cross-mae-rtx.pth
51
+ vision_lora: false
52
+ vision_lora_rank: 8
53
+ vision_nonpretrained: false
54
+ vision_unfreeze_all: false
55
+ vision_unfreeze_last_n: 0
56
+ optimizer_cfg: !!python/object:icrt.util.args.OptimizerConfig
57
+ blr: 0.001
58
+ lr: 0.0005
59
+ min_lr: 0.0
60
+ warmup_epochs: 1.25
61
+ weight_decay: 0.01
62
+ shared_cfg: !!python/object:icrt.util.args.SharedConfig
63
+ batch_size: 1
64
+ num_cameras: 2
65
+ num_pred_steps: 16
66
+ num_stages: 1
67
+ resume: null
68
+ rot_6d: true
69
+ save_every: 5
70
+ seed: 0
71
+ seq_length: 512
72
+ split_epoch: 1
73
+ start_epoch: 0
74
+ use_delta_action: true
75
+ train: true
76
+ trainer_cfg: !!python/object:icrt.util.args.TrainerConfig
77
+ accum_iter: 8
78
+ epochs: 125
79
+ num_workers: 20
80
+ pin_memory: true
81
+ world_size: 1