decruz07 commited on
Commit
08a86eb
1 Parent(s): 9ec17e8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mlabonne/Marcoro14-7B-slerp
3
+ license: apache-2.0
4
+ datasets:
5
+ - argilla/distilabel-intel-orca-dpo-pairs
6
+ ---
7
+
8
+ # Model Card for decruz07/kellemar-DPO-Orca-Distilled-7B
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ This model was created using mlabonne/Marcoro14-7B-slerp as the base, and finetuned with argilla/distilabel-intel-orca-dpo-pairs
13
+
14
+
15
+ ## Model Details
16
+
17
+ Finetuned with these specific parameters:
18
+ Steps: 200
19
+ Learning Rate: 5e5
20
+ Beta: 0.1
21
+
22
+ ### Model Description
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
+ - **Developed by:** @decruz
27
+ - **Funded by [optional]:** my full-time job
28
+ - **Finetuned from model [optional]:** teknium/OpenHermes-2.5-Mistral-7B
29
+
30
+ ## Benchmarks
31
+
32
+
33
+ ## Uses
34
+
35
+ You can use this for basic inference. You could probably finetune with this if you want to.
36
+
37
+
38
+ ## How to Get Started with the Model
39
+
40
+ You can create a space out of this, or use basic python code to call the model directly and make inferences to it.
41
+
42
+ [More Information Needed]
43
+
44
+ ## Training Details
45
+
46
+ The following was used:
47
+ `training_args = TrainingArguments(
48
+ per_device_train_batch_size=4,
49
+ gradient_accumulation_steps=4,
50
+ gradient_checkpointing=True,
51
+ learning_rate=5e-5,
52
+ lr_scheduler_type="cosine",
53
+ max_steps=200,
54
+ save_strategy="no",
55
+ logging_steps=1,
56
+ output_dir=new_model,
57
+ optim="paged_adamw_32bit",
58
+ warmup_steps=100,
59
+ bf16=True,
60
+ report_to="wandb",
61
+ )
62
+
63
+ # Create DPO trainer
64
+ dpo_trainer = DPOTrainer(
65
+ model,
66
+ ref_model,
67
+ args=training_args,
68
+ train_dataset=dataset,
69
+ tokenizer=tokenizer,
70
+ peft_config=peft_config,
71
+ beta=0.1,
72
+ max_prompt_length=1024,
73
+ max_length=1536,
74
+ )`
75
+
76
+ ### Training Data
77
+
78
+ This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
79
+
80
+ ### Training Procedure
81
+
82
+ Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO.
83
+
84
+ ## Model Card Authors [optional]
85
+
86
+ @decruz
87
+
88
+ ## Model Card Contact
89
+
90
+ @decruz on X/Twitter