Shahradmz commited on
Commit
839c719
1 Parent(s): e6cbef1

Model save

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  base_model: allenai/OLMo-1B-hf
3
- library_name: peft
4
  model_name: OLMo-1B-hf-DPO-constitution-2
5
  tags:
6
  - generated_from_trainer
@@ -27,13 +27,12 @@ print(output["generated_text"])
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/shahrad_m/DPO-OLMo-1B-hf-constitution-2/runs/pstf92mj)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
34
  ### Framework versions
35
 
36
- - PEFT 0.13.2
37
  - TRL: 0.12.1
38
  - Transformers: 4.46.2
39
  - Pytorch: 2.5.1
 
1
  ---
2
  base_model: allenai/OLMo-1B-hf
3
+ library_name: transformers
4
  model_name: OLMo-1B-hf-DPO-constitution-2
5
  tags:
6
  - generated_from_trainer
 
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/shahrad_m/DPO-OLMo-1B-hf-constitution-2/runs/hpghf4ax)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
34
  ### Framework versions
35
 
 
36
  - TRL: 0.12.1
37
  - Transformers: 4.46.2
38
  - Pytorch: 2.5.1