adamo1139
/

Yi-34B-200K-AEZAKMI-RAW-2301

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

adamo1139 commited on Jan 25

Commit

2423728

•

1 Parent(s): f99dc90

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -48,6 +48,8 @@ One big issue I noticed is that I think I set too small of a learning rate for S
 Other small issue is that when you enter a prompt that might have resulted with refusal in a previous model, the response will be more free-form and probably will have a touch of completion in it.
 So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here.
 ## Unsloth training parameters DPO Stage
@@ -71,7 +73,7 @@ So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first pr
 - lora_alpha: 32
 - max_length: 2200
 - learning_rate: 0.00006
-- lr_scheduler_type: "cosine
 - lr_scheduler_kwargs: {
     "num_cycles" : 0.3,
   }

 Other small issue is that when you enter a prompt that might have resulted with refusal in a previous model, the response will be more free-form and probably will have a touch of completion in it.
 So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" alt="made with Unsloth" width="400" height="64"/>](https://github.com/unslothai/unsloth)
 ## Unsloth training parameters DPO Stage
 - lora_alpha: 32
 - max_length: 2200
 - learning_rate: 0.00006
+- lr_scheduler_type: "cosine"
 - lr_scheduler_kwargs: {
     "num_cycles" : 0.3,
   }