NeMo
English
nvidia
llama3.1
reward model
arthrod commited on
Commit
c887398
1 Parent(s): e05f4c9

Update README.md

Browse files

Commenting out the two lines result in an error because we need the variable reward_each later on.

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -111,7 +111,7 @@ python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
111
  2. Annotate data files using the served reward model. As an example, this can be the Open Assistant train/val files. Then follow the next step to train a SteerLM model based on [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html#step-5-train-the-attribute-conditioned-sft-model) .
112
 
113
  Please note that this script rounds the predicted floats to the nearest int (between 0 and 4 inclusive), as it's meant for SteerLM training.
114
- For other use cases (e.g. reward bench measurement, response filtering/ranking), we recommend using the floats directly, which can be done by commenting out [two lines of code in NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner/blob/main/examples/nlp/data/steerlm/attribute_annotate.py#L135-136)
115
 
116
  ```python
117
  python /opt/NeMo-Aligner/examples/nlp/data/steerlm/preprocess_openassistant_data.py --output_directory=data/oasst
 
111
  2. Annotate data files using the served reward model. As an example, this can be the Open Assistant train/val files. Then follow the next step to train a SteerLM model based on [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html#step-5-train-the-attribute-conditioned-sft-model) .
112
 
113
  Please note that this script rounds the predicted floats to the nearest int (between 0 and 4 inclusive), as it's meant for SteerLM training.
114
+ For other use cases (e.g. reward bench measurement, response filtering/ranking), we recommend using the floats directly, which can be done by commenting out [one line of code in NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner/blob/main/examples/nlp/data/steerlm/attribute_annotate.py#L136)
115
 
116
  ```python
117
  python /opt/NeMo-Aligner/examples/nlp/data/steerlm/preprocess_openassistant_data.py --output_directory=data/oasst