G-reen commited on
Commit
5bbf297
·
verified ·
1 Parent(s): 8e7c83c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -18
README.md CHANGED
@@ -1,23 +1,16 @@
1
- ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- tags:
6
- - text-generation-inference
7
- - transformers
8
- - unsloth
9
- - mistral
10
- - trl
11
- - dpo
12
- base_model: unsloth/mistral-7b-v0.2-bnb-4bit
13
- ---
14
 
15
- # Uploaded model
 
 
 
 
 
 
 
16
 
17
- - **Developed by:** G-reen
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/mistral-7b-v0.2-bnb-4bit
20
 
21
- This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
+ This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ Rank: 8
4
+ Alpha: 16
5
+ Learning rate: 5e-6
6
+ Beta: 0.1
7
+ Batch size: 8
8
+ Epochs: 1
9
+ Learning rate schedulers: Linear
10
+ Prompt Format: ``You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>``
11
 
12
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
 
 
13
 
14
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)
15
 
16
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)