adamo1139 commited on
Commit
2423728
1 Parent(s): f99dc90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -48,6 +48,8 @@ One big issue I noticed is that I think I set too small of a learning rate for S
48
  Other small issue is that when you enter a prompt that might have resulted with refusal in a previous model, the response will be more free-form and probably will have a touch of completion in it.
49
  So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here.
50
 
 
 
51
 
52
  ## Unsloth training parameters DPO Stage
53
 
@@ -71,7 +73,7 @@ So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first pr
71
  - lora_alpha: 32
72
  - max_length: 2200
73
  - learning_rate: 0.00006
74
- - lr_scheduler_type: "cosine
75
  - lr_scheduler_kwargs: {
76
  "num_cycles" : 0.3,
77
  }
 
48
  Other small issue is that when you enter a prompt that might have resulted with refusal in a previous model, the response will be more free-form and probably will have a touch of completion in it.
49
  So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here.
50
 
51
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" alt="made with Unsloth" width="400" height="64"/>](https://github.com/unslothai/unsloth)
52
+
53
 
54
  ## Unsloth training parameters DPO Stage
55
 
 
73
  - lora_alpha: 32
74
  - max_length: 2200
75
  - learning_rate: 0.00006
76
+ - lr_scheduler_type: "cosine"
77
  - lr_scheduler_kwargs: {
78
  "num_cycles" : 0.3,
79
  }