NeMo
English
nvidia
steerlm
reward model
zhilinw commited on
Commit
75daf1f
1 Parent(s): 4332a35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -25,6 +25,8 @@ datasets:
25
 
26
  The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
27
 
 
 
28
  It supports a context length of up to 4,096 tokens.
29
 
30
  Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.
 
25
 
26
  The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
27
 
28
+ Try it for free at [build.nvidia.com](https://build.nvidia.com/nvidia/nemotron-4-340b-reward) - comes with an OpenAI-compatible API interface!
29
+
30
  It supports a context length of up to 4,096 tokens.
31
 
32
  Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.