Transformers
PyTorch
English
trl
rlhf
natolambert commited on
Commit
90ee929
1 Parent(s): f3f13a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ datasets:
16
  # Llama-se-rl-peft
17
  Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
18
  For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
19
-
20
 
21
  ## Model Description
22
  **Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.
 
16
  # Llama-se-rl-peft
17
  Adapter weights of an RL fine-tuned model based on LLaMA (see Meta's LLaMA release for the original LLaMA model).
18
  For more info check out the [blog post](https://huggingface.co/blog/stackllama) and [github example](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts).
19
+ The reward model used to train this model can be found [here](https://huggingface.co/trl-lib/llama-7b-se-rm-peft).
20
 
21
  ## Model Description
22
  **Llama-se-rl** is a Llama-based model that has been first fine-tuned on the Stack Exchange dataset and then RL fine-tuned using a Stack Exchange Reward Model.