Update README.md
Browse files
README.md
CHANGED
@@ -17,12 +17,14 @@ datasets:
|
|
17 |
# NV-Llama2-70B-RLHF-Chat
|
18 |
|
19 |
## Description
|
20 |
-
NV-Llama2-70B-RLHF-Chat is a 70 billion parameter generative language model instruct-tuned on [LLama2-70B](https://huggingface.co/meta-llama/Llama-2-70b) model. It takes input with context length up to 4,096 tokens. The model has been fine-tuned for instruction following using Supervised Fine-tuning (SFT) on
|
21 |
|
22 |
NV-Llama2-70B-RLHF-Chat is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
|
23 |
|
24 |
Try this model instantly for free hosted by us at [NVIDIA AI Playground](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/nv-llama2-70b-rlhf). You can use this in the provided UI or through a limited access API (up to 10, 000 requests within 30 days). If you would need more requests, we demonstrate how you can set up an inference server below.
|
25 |
|
|
|
|
|
26 |
<img src="https://huggingface.co/nvidia/NV-Llama2-70B-RLHF-Chat/resolve/main/mtbench_categories.png" alt="MT Bench Categories" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
27 |
|
28 |
|
|
|
17 |
# NV-Llama2-70B-RLHF-Chat
|
18 |
|
19 |
## Description
|
20 |
+
NV-Llama2-70B-RLHF-Chat is a 70 billion parameter generative language model instruct-tuned on [LLama2-70B](https://huggingface.co/meta-llama/Llama-2-70b) model. It takes input with context length up to 4,096 tokens. The model has been fine-tuned for instruction following using Supervised Fine-tuning (SFT) on [NVIDIA SFT Datablend v1](https://huggingface.co/datasets/nvidia/sft_datablend_v1) [^1] and Reinforcement Learning from Human Feedback (RLHF) on [HH-RLHF dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) , achieving 7.59 on MT-Bench and demonstrating strong performance on academic benchmarks.
|
21 |
|
22 |
NV-Llama2-70B-RLHF-Chat is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
|
23 |
|
24 |
Try this model instantly for free hosted by us at [NVIDIA AI Playground](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/nv-llama2-70b-rlhf). You can use this in the provided UI or through a limited access API (up to 10, 000 requests within 30 days). If you would need more requests, we demonstrate how you can set up an inference server below.
|
25 |
|
26 |
+
[^1]: as well as ~5k proprietary datapoints that we are unable to release due to data vendor restrictions
|
27 |
+
|
28 |
<img src="https://huggingface.co/nvidia/NV-Llama2-70B-RLHF-Chat/resolve/main/mtbench_categories.png" alt="MT Bench Categories" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
29 |
|
30 |
|