abaheti95
/

dpo_qlora_hh

Model card Files Files and versions Community

abaheti95 commited on Oct 4, 2023

Commit

e25af87

•

1 Parent(s): 895fa7e

Upload 8 files

Browse files

Files changed (8) hide show

README.md +60 -3
adapter_config.json +26 -0
adapter_model.bin +3 -0
eval_trajectory.jsonl +11 -0
harmless_base_eval_results.jsonl +0 -0
helpful_base_eval_results.jsonl +0 -0
helpful_online_eval_results.jsonl +0 -0
helpful_rejection_eval_results.jsonl +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,60 @@
----
-license: apache-2.0
----

+## HH-RLHF QLoRA adapters trained with Direct Preference Optimization within the experiments with A-LoL
+| [Paper](https://arxiv.org/abs/2305.14718) | [Code](https://github.com/abaheti95/LoL-RL) |
+### Model description
+We continued QLoRA finetuning on [HH-RLHF dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) starting from [timdettmers/qlora-hh-rlhf-7b](https://huggingface.co/timdettmers/qlora-hh-rlhf-7b) for 1 epoch. In [DPO](https://arxiv.org/abs/2305.18290) we use both chosen and rejected preferred responses to finetune the language model. DPO is considered as the main preference-based offline RL baseline for comparing with Advantage Leftover Lunch methods.
+⚠️ These models are purely intended for research purposes and could produce problematic outputs.
+### Usage
+```python
+tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b")
+tokenizer.truncation_side = "left"
+tokenizer.padding_side = "left"
+base_model = AutoModelForCausalLM.from_pretrained(
+    "huggyllama/llama-7b",
+    load_in_4bit=True,
+    torch_dtype=torch.float16,
+    device_map="auto",
+    max_memory= {i: '40000MB' for i in range(torch.cuda.device_count())},
+    quantization_config=BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_compute_dtype=torch.float16,
+        bnb_4bit_quant_type='nf4'
+    ),
+)
+model = PeftModel.from_pretrained(base_model, "abaheti95/dpo_qlora_hh")
+prefix = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n### Human: How would I go about making homemade chewing gum? ### Assistant: Well, it\u2019s pretty easy, but you can either make it a bit better or more interesting if you use xanthan gum, which is a commonly used thickener. Xanthan gum is added to food products as a thickener, gelling agent, emulsifier, and stabilizer, which means that it allows sugar syrups to form gels. If you can get hold of some, you could use it to make really interesting gum! ### Human: Okay what do I do when I have the xanthan gum? What else do I need to add? ### Assistant: Well, the important ingredient is the gums.  You can find many different types at your local pharmacy, which should be able to help you figure out what you need.  You will need:  1) a mixer, 2) sugar, 3) xanthan gum, 4) flavoring, 5) gum base, and 6) a dehydrator, which is a special oven that dries food. ### Human: Why would I need a dehydrator to make gum? ### Assistant:"
+current_batch_prefixes_inputs = tokenizer([prefix], max_length = 768 - 128,truncation = True,add_special_tokens=True, padding = True, return_tensors="pt").to(model.device)
+# 835 = "###"
+predicted_sents = model.generate(**current_batch_prefixes_inputs, max_new_tokens=128,pad_token_id=tokenizer.pad_token_id,num_beams=1,do_sample=False,num_return_sequences = 1, eos_token_id=[835, tokenizer.eos_token_id])
+repsonse_tokens = predicted_sents[:, current_batch_prefixes_inputs['input_ids'].shape[-1]:]
+responses = tokenizer.batch_decode(repsonse_tokens, skip_special_tokens=True)
+# Normalize responses
+responses_normalized = [resp.split("\n Human:")[0].split("\nHuman:")[0].split("\n### Human")[0].strip() for resp in responses]
+responses_normalized = [resp.replace("###", "").strip() if resp.endswith("###") else resp.strip() for resp in responses_normalized]
+```
+We also show the evaluation results of the model on the test set in file: `harmless_base_eval_results.jsonl`, `helpful_base_eval_results.jsonl`, `helpful_online_eval_results.jsonl` and `helpful_rejection_eval_results.jsonl`.
+### Framework version and configuration
+- PEFT 0.5.0
+The following `bitsandbytes` quantization config was used during training:
+- quant_method: bitsandbytes
+- load_in_8bit: False
+- load_in_4bit: True
+- llm_int8_threshold: 6.0
+- llm_int8_skip_modules: None
+- llm_int8_enable_fp32_cpu_offload: False
+- llm_int8_has_fp16_weight: False
+- bnb_4bit_quant_type: nf4
+- bnb_4bit_use_double_quant: False
+- bnb_4bit_compute_dtype: float16

adapter_config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "auto_mapping": null,
+  "base_model_name_or_path": "/gscratch/zlab/llama/7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 16.0,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "down_proj",
+    "up_proj",
+    "q_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:46ae46cc13a4ab2ab925d52d948b7ddab41ea9cdb076b947ad0b6169c2105bab
+size 639792909

eval_trajectory.jsonl ADDED Viewed

	@@ -0,0 +1,11 @@

+{"step": 0, "avg_reward": 0.4714483038156426}
+{"step": 7200, "avg_reward": 0.5663666518924791}
+{"step": 14400, "avg_reward": 0.569705726823096}
+{"step": 21600, "avg_reward": 0.5740199956743579}
+{"step": 28800, "avg_reward": 0.573887342033309}
+{"step": 36000, "avg_reward": 0.6045272605834595}
+{"step": 43200, "avg_reward": 0.5405997881724034}
+{"step": 50400, "avg_reward": 0.502069472695335}
+{"step": 57600, "avg_reward": 0.6652014100092596}
+{"step": 64800, "avg_reward": 0.5776967554685792}
+{"step": 72000, "avg_reward": 0.6279109552519263}

harmless_base_eval_results.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

helpful_base_eval_results.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

helpful_online_eval_results.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

helpful_rejection_eval_results.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff