nvidia
/

Llama3-ChatQA-1.5-70B

@@ -16,10 +16,10 @@ tags:
 We introduce Llama3-ChatQA-1.5, which excels at conversational question answering (QA) and retrieval-augmented generation (RAG). Llama3-ChatQA-1.5 is developed using an improved training recipe from [ChatQA (1.0)](https://arxiv.org/abs/2401.10225), and it is built on top of [Llama-3 base model](https://huggingface.co/meta-llama/Meta-Llama-3-8B). Specifically, we incorporate more conversational QA data to enhance its tabular and arithmetic calculation capability. Llama3-ChatQA-1.5 has two variants: Llama3-ChatQA-1.5-8B and Llama3-ChatQA-1.5-70B. Both models were originally trained using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), we converted the checkpoints to Hugging Face format.
 ## Other Resources
-[Llama3-ChatQA-1.5-8B](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B) &ensp; [Evaluation Data](https://huggingface.co/datasets/nvidia/ConvRAG-Bench) &ensp; [Training Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data) &ensp; [Retriever](https://huggingface.co/nvidia/dragon-multiturn-query-encoder)
 ## Benchmark Results
-Results in ConvRAG Bench are as follows:
 | | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
 | -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
@@ -36,7 +36,7 @@ Results in ConvRAG Bench are as follows:
 | Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
 | Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
-Note that ChatQA-1.5 is built based on Llama-3 base model, and ChatQA-1.0 is built based on Llama-2 base model. ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ConvRAG can be found [here](https://huggingface.co/datasets/nvidia/ConvRAG-Bench).
 ## Prompt Format

 We introduce Llama3-ChatQA-1.5, which excels at conversational question answering (QA) and retrieval-augmented generation (RAG). Llama3-ChatQA-1.5 is developed using an improved training recipe from [ChatQA (1.0)](https://arxiv.org/abs/2401.10225), and it is built on top of [Llama-3 base model](https://huggingface.co/meta-llama/Meta-Llama-3-8B). Specifically, we incorporate more conversational QA data to enhance its tabular and arithmetic calculation capability. Llama3-ChatQA-1.5 has two variants: Llama3-ChatQA-1.5-8B and Llama3-ChatQA-1.5-70B. Both models were originally trained using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), we converted the checkpoints to Hugging Face format.
 ## Other Resources
+[Llama3-ChatQA-1.5-8B](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B) &ensp; [Evaluation Data](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) &ensp; [Training Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data) &ensp; [Retriever](https://huggingface.co/nvidia/dragon-multiturn-query-encoder)
 ## Benchmark Results
+Results in ChatRAG Bench are as follows:
 | | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
 | -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
 | Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
+Note that ChatQA-1.5 is built based on Llama-3 base model, and ChatQA-1.0 is built based on Llama-2 base model. ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ChatRAG can be found [here](https://huggingface.co/datasets/nvidia/ChatRAG-Bench).
 ## Prompt Format