alnrg2arg
/

test3_sft_4bit_dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

alnrg2arg commited on Jan 27

Commit

a5263a5

•

1 Parent(s): 83e6912

Update README.md

Files changed (1) hide show

README.md +43 -7

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 language:
 - en
-license: apache-2.0
 tags:
 - text-generation-inference
 - transformers
@@ -9,14 +9,50 @@ tags:
 - mistral
 - trl
 base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
 ---
-# Uploaded  model
-- **Developed by:** alnrg2arg
-- **License:** apache-2.0
-- **Finetuned from model :** alnrg2arg/blockchainlabs_7B_merged_test2_4
-This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 language:
 - en
+license: cc-by-nc-4.0
 tags:
 - text-generation-inference
 - transformers
 - mistral
 - trl
 base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
+datasets:
+- Intel/orca_dpo_pairs
 ---
+This is a model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.
+The project is running to make a small LLM for a on-device purpose.
+Overall pipeline for this iteration is
+1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.
+This model which is not pruned is intended to compare with the pruned model.
+This is the code and parameters I chose for this model(DPO).
+```
+from transformers import TrainingArguments, AutoModelForCausalLM
+from trl import DPOTrainer
+dpo_trainer = DPOTrainer(
+    model = model,
+    ref_model = None,
+    args = TrainingArguments(
+        per_device_train_batch_size = 8,
+        gradient_accumulation_steps = 8,
+        warmup_ratio = 0.1,
+        num_train_epochs = 3,
+        learning_rate = 5e-6,
+        fp16 = not torch.cuda.is_bf16_supported(),
+        bf16 = torch.cuda.is_bf16_supported(),
+        logging_steps = 1,
+        optim = "adamw_8bit",
+        weight_decay = 0.0,
+        lr_scheduler_type = "linear",
+        seed = 42,
+        output_dir = "output_DPO",
+    ),
+    beta = 0.1,
+    train_dataset = dataset,
+    # eval_dataset = raw_datasets["test"],
+    tokenizer = tokenizer,
+    max_length = 1024,
+    max_prompt_length = 512,
+)
+```
+The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing