Ejafa commited on
Commit
9c0c4b2
1 Parent(s): db8be2a

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Mixed-modal and Text-only Prompts for Human Evaluation
2
+
3
+ This file ```prompts_for_human_evaluations.jsonl``` contains the 1,048 prompts used for evaluating Chameleon's output: 441 (42.1%) are mixed-modal (i.e., containing both text and images), and the remaining 607 (57.9%) are text-only. The expected responses are mixed-modal, containing both text and images.
4
+
5
+ ## Background
6
+
7
+ We work with a third-party crowdsourcing vendor to collect a set of diverse and natural prompts from human annotators. Specifically, we ask annotators to creatively think about what they want a multi-modal model to generate for different real-life scenarios. For example, for the scenario of “imagine you are in a kitchen”, annotators may come up with prompts like “How to cook pasta?” or “How should I design the layout of my island? Show me some examples.” The prompts can be text-only or text with some images, and the expected responses should be mixed-modal, containing both text and images.
8
+
9
+ After collecting an initial set of prompts, we ask three random annotators to evaluate whether the prompts are clear and whether they expect the responses to contain images. We use a majority vote to filter unclear prompts and prompts that don’t expect mixed-modal responses. In the end, our final evaluation set contains
10
+ 1,048 prompts: 441 (42.1%) are mixed-modal (i.e., containing both text and images), and the remaining 607 (57.9%) are text-only.
11
+
12
+ More details on how these prompts are collected and some statistics can be found in the [paper](https://arxiv.org/pdf/2405.09818).
13
+
14
+ ## File format
15
+
16
+ Each line of the file ```prompts_for_human_evaluations.jsonl``` defines a prompt, with the following fields:
17
+ - ```id```: The GUID of this prompt.
18
+ - ```prompt```: The prompt content. If the prompt contains images, then their position is given by the special ```<img>``` token.
19
+ - ```task_type```: The task category of this prompt.
20
+ - ```image_urls```: A list of the URLs of images used in the prompts. Each image maps to a special ```<img>``` token in the prompt by order.
models/7b/checklist.chk ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 60333d9acd866e4b5e6690ecabee3b65 consolidated.pth
2
+ b2cbf6940c157b6e969f6263388efcc3 consolidate_params.json
3
+ 8bc0e859d4afa00f5f40a9de296eef8e params.json
models/7b/consolidate_params.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dtype": "bf16",
3
+ "model_parallel_size": 1,
4
+ "on_gpu": true,
5
+ "src": "/fsx-onellm/rpasunuru/SFT/v2.1_textpp_7b_1366k_sftv1.4_exp1/v2.1_textpp_7b_1366k_sftv1.4_exp1_run000/checkpoints/checkpoint_0001200_noimggen/",
6
+ "tgt": "/fsx-onellm/rpasunuru/SFT/v2.1_textpp_7b_1366k_sftv1.4_exp1/v2.1_textpp_7b_1366k_sftv1.4_exp1_run000/checkpoints/checkpoint_0001200_noimggen_consolidated/",
7
+ "tokenizer_path": null
8
+ }
models/7b/consolidated.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:284172dd1aa7d6520277e6565080748031a730a3dc557a07ae6603ea60335db2
3
+ size 14026453679
models/7b/params.json ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "async_checkpointing": false,
3
+ "async_eval_ngpus": -1,
4
+ "batch_size": 2,
5
+ "data": "",
6
+ "disable_logging": false,
7
+ "disable_workers_print": false,
8
+ "dtype": "bf16",
9
+ "dump_after_steps": 0,
10
+ "dump_dir": "/fsx-onellm/rpasunuru/SFT/v2.1_textpp_7b_1366k_sftv1.4_exp1/v2.1_textpp_7b_1366k_sftv1.4_exp1_run000",
11
+ "dump_freq": 400,
12
+ "dump_profile_traces": false,
13
+ "enable_loss_tracker": false,
14
+ "epochs": -1,
15
+ "eval_freq": 400,
16
+ "exp_id": "",
17
+ "exp_name": "",
18
+ "finetuning_dir": "/fsx-onellm/shared/from_rsc/v2.1_7b_dr_qk_zloss_linear_zero3_sft_optiml_textpp_run000_checkpoint_1366000",
19
+ "fp32_reduce_scatter": "all",
20
+ "gpu_check_level": 3,
21
+ "image_loss_weight": 1.0,
22
+ "image_text_rotation_prob": 0.0,
23
+ "instruct": {
24
+ "no_loss_prompt": true,
25
+ "no_loss_truncated": false,
26
+ "use_eot": true
27
+ },
28
+ "instruct_data": "/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/long_caption:2.92,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/vqa:4.59,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/text2image:10.44,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/llama2_rjv6_helpful:43.27,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/code_llama:0.51,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/interleaved_batch1-17:27.45,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/image_dialogue:7.46,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/llama2_rjv6_harmless:0.97,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/cybersec_safety:0.33,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/onellm_multimodal_safety:0.86,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/autosafety:0.51,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/rainbow_safety:0.10,/fsx-onellm/shared/finetuning/sft_v1.4_data/splits/train/genai_safety:0.58",
29
+ "iter_gopher": {
30
+ "buffer_size": 16,
31
+ "max_precompute": 10,
32
+ "n_chars_by_tok": 15,
33
+ "n_seqs_to_concat": 10,
34
+ "num_processes": 1
35
+ },
36
+ "iter_jsonl": {
37
+ "buffer_size": 64,
38
+ "same_data": false
39
+ },
40
+ "iter_multi": {
41
+ "buffer_size": 512,
42
+ "ignore_extra_chunks": true,
43
+ "max_precompute": 20,
44
+ "multiprocess": true
45
+ },
46
+ "iter_type": "multi",
47
+ "keep_checkpoints_every_steps": 400,
48
+ "keep_eval_checkpoints": true,
49
+ "keep_n_last_checkpoints": 2,
50
+ "log_all_steps": false,
51
+ "log_freq": 10,
52
+ "log_updates": true,
53
+ "log_wandb": false,
54
+ "loss_rescaling": false,
55
+ "model": {
56
+ "add_extra_toks": "0",
57
+ "alpha_depth": "disabled",
58
+ "attn_dropout": 0,
59
+ "attn_to_keep": "all",
60
+ "custom_bwd": false,
61
+ "dim": 4096,
62
+ "dropout": 0.05,
63
+ "efficient_attn": "flash",
64
+ "emb_dropout": 0,
65
+ "ffn_dim_multiplier": 1.0,
66
+ "ffn_dropout": 0,
67
+ "full_logging_n_layers": 4,
68
+ "fuse_sequence_parallel": false,
69
+ "init": {
70
+ "coeff_std": null,
71
+ "depth_last": false,
72
+ "fixed_std": null,
73
+ "no_init": false,
74
+ "pos_init_scalar": null,
75
+ "use_depth": "current",
76
+ "use_gaussian": true
77
+ },
78
+ "layer_ckpt": "none",
79
+ "linear_residual_dropout": false,
80
+ "loss_parallel": false,
81
+ "max_length": 2048,
82
+ "multiple_of": 256,
83
+ "n_heads": 32,
84
+ "n_kv_heads": null,
85
+ "n_layers": 32,
86
+ "non_linearity": "swiglu",
87
+ "norm_affine": true,
88
+ "norm_eps": 1e-05,
89
+ "norm_type": "rmsnorm",
90
+ "output_dropout": 0,
91
+ "output_size": -1,
92
+ "pre_norm": true,
93
+ "qk_normalization": true,
94
+ "recompute_attn": true,
95
+ "recompute_fc1_out": true,
96
+ "recompute_fc3_out": true,
97
+ "residual_dropout": 0.0,
98
+ "rope_theta": 10000.0,
99
+ "sequence_parallel": false,
100
+ "swin_norm": false,
101
+ "turn_eos_token": "<eos>",
102
+ "use_rope": true,
103
+ "vocab_size": 65536
104
+ },
105
+ "model_parallel_size": 1,
106
+ "no_final_ckpt": false,
107
+ "num_retrieved_docs": 0,
108
+ "old_mp": -1,
109
+ "old_world_size": -1,
110
+ "optim": {
111
+ "beta1": 0.9,
112
+ "beta2": 0.95,
113
+ "clip": 1.0,
114
+ "cosine_theta": 1.0,
115
+ "cycle_length": 1.0,
116
+ "epsilon": 1e-08,
117
+ "exp_factor": 0.5,
118
+ "lr": 1e-05,
119
+ "lr_min_ratio": 0.1,
120
+ "scheduler": "cosine",
121
+ "use_deprecated_optim": false,
122
+ "warmup": 100,
123
+ "weight_decay": 0.1
124
+ },
125
+ "periodic_gpu_check": true,
126
+ "profile_freq": -1,
127
+ "reshard_after_forward": false,
128
+ "restore_dataloader_position": false,
129
+ "retrieval_prob": 0.0,
130
+ "rlhf": null,
131
+ "root_dump_dir": "",
132
+ "save_optimizer_states": true,
133
+ "seq_len": 4096,
134
+ "slurm": {
135
+ "global_rank": 0,
136
+ "is_slurm_job": true,
137
+ "world_size": 64
138
+ },
139
+ "steps": 1200,
140
+ "tokenizer": "/fsx-onellm/rpasunuru/models/cm3z/cm3v2_7b_placeholder/gpt2-unified-image-sentinel.json",
141
+ "tokenizer_dir": "/fsx/guismay/data/large_experiments/fair_llm/datasets/tokenizers",
142
+ "torch_seed": -1,
143
+ "unlimited_steps": false,
144
+ "use_hf_tokenizer": true,
145
+ "valid": {
146
+ "batch_size": 32,
147
+ "debug": false,
148
+ "majority_voting": 0,
149
+ "n_batches": 100,
150
+ "onellm_eval": false,
151
+ "onellm_eval_media_storage": "",
152
+ "ppl_files_str": "",
153
+ "prompt_path": "",
154
+ "prompt_templates": "{}",
155
+ "random_fewshots": false,
156
+ "seq_len": 2048,
157
+ "tasks_root_dir": "",
158
+ "tasks_str": "",
159
+ "temperature": 1.0,
160
+ "top_k": 0,
161
+ "top_p": 0.0,
162
+ "use_sampling": false,
163
+ "write_eval": false
164
+ },
165
+ "wandb_entity": "violet-zct",
166
+ "wandb_project": "instruct_sft",
167
+ "water_marking_codes_str": null,
168
+ "z_loss_weight": 0.0001
169
+ }
prompts_for_human_evaluations.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/checklist.chk ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 170a932b687671a4e676f3bf69147295 text_tokenizer.json
2
+ 1a559fb5dab4d351d19496ae89da1db1 vqgan.ckpt
3
+ 25724c8110d6adabc9130a123b4b922e vqgan.yaml
tokenizer/text_tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/vqgan.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ede986bf6b171db3081ce171ad88e4ac970793cea14c180b3e5ac5105f4cb43
3
+ size 281270377
tokenizer/vqgan.yaml ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ base_learning_rate: 4.5e-06
3
+ target: taming.models.vqgan.VQModel
4
+ params:
5
+ embed_dim: 256
6
+ n_embed: 8192
7
+ ddconfig:
8
+ double_z: false
9
+ z_channels: 256
10
+ resolution: 512
11
+ in_channels: 3
12
+ out_ch: 3
13
+ ch: 128
14
+ ch_mult:
15
+ - 1
16
+ - 1
17
+ - 2
18
+ - 2
19
+ - 4
20
+ num_res_blocks: 2
21
+ attn_resolutions: []
22
+ dropout: 0.0
23
+ lossconfig:
24
+ target: taming.modules.losses.vqperceptual_vit_vqgan.VQLPIPSWithDiscriminator
25
+ params:
26
+ disc_start: 100001
27
+ perceptual_weight: 1.0
28
+ adversarial_weight: 0.5
29
+ disc_params:
30
+ size: 512
31
+ ckpt_path: manifold://fair_onellm_checkpoints/tree/v2/tokenizer/vqgan_wm_0209.ckpt
32
+ data:
33
+ target: main.DataModuleFromConfig
34
+ params:
35
+ batch_size: 4
36
+ num_workers: 10
37
+ image_size: 512
38
+ filter_image_size: 512
39
+ dataset: coco
40
+ aesthetics_th: 0
41
+ clipsim_th: 0
42
+ --distributed-world-size: null
43
+ '32': null
44
+ --distributed-port: null
45
+ '17338': null
46
+ --save-dir: null
47
+ /checkpoint/shellysheynin/shutterstock/512x512_1024tokens_4node_shutterstock_laion_no_attn_styleGAN:
48
+ log_every-500:
49
+ ngpu32: null
50
+ --tensorboard-logdir: null
51
+ /checkpoint/shellysheynin/tensorboard_logs/2023-03-30/512x512_1024tokens_4node_shutterstock_laion_no_attn_styleGAN:
52
+ log_every-500:
53
+ ngpu32: null
54
+ '14561': null
55
+ /checkpoint/shellysheynin/tensorboard_logs/2023-04-02/512x512_1024tokens_4node_shutterstock_laion_no_attn_styleGAN:
56
+ log_every-500:
57
+ ngpu32: null