nvidia
/

Llama3-ChatQA-1.5-8B

@@ -19,7 +19,7 @@ We release ChatQA1.5, which excels at RAG-based conversational question answerin
 Results in ConvRAG are as follows:
 | | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
-| -- | -- | -- | -- | -- | -- | -- | -- |
 | Doc2Dial | 37.88 | 33.51 | 37.88 | 34.16 | 38.9 | 39.33 | 41.26 |
 | QuAC | 29.69 | 34.16 | 36.96 | 40.29 | 41.82 | 39.73 | 38.82 |
 | QReCC | 46.97 | 49.77 | 51.34 | 52.01 | 48.05 | 49.03 | 51.40 |
@@ -33,7 +33,24 @@ Results in ConvRAG are as follows:
 | Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
 | Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
-Note that ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial.
 ## How to use
 ```python
@@ -57,6 +74,7 @@ def get_formatted_input(messages, context):
     for item in enumerate(messages):
         if item['role'] == "user":
             item['content'] = instruction + " " + item['content']
             break
@@ -88,15 +106,17 @@ response = outputs[0][input_ids.shape[-1]:]
 print(tokenizer.decode(response, skip_special_tokens=True))
 ```
-## Contact
 Zihan Liu (zihanl@nvidia.com), Wei Ping (wping@nvidia.com)
 ## Citation
-<pre>@article{liu2024chatqa,
   title={ChatQA: Building GPT-4 Level Conversational QA Models},
   author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan},
   journal={arXiv preprint arXiv:2401.10225},
-  year={2024}}</pre>
 ## License

 Results in ConvRAG are as follows:
 | | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
+| -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | Doc2Dial | 37.88 | 33.51 | 37.88 | 34.16 | 38.9 | 39.33 | 41.26 |
 | QuAC | 29.69 | 34.16 | 36.96 | 40.29 | 41.82 | 39.73 | 38.82 |
 | QReCC | 46.97 | 49.77 | 51.34 | 52.01 | 48.05 | 49.03 | 51.40 |
 | Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
 | Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
+Note that ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ConvRAG can be found here.
+## Prompt Format
+<pre>
+System: {System}
+{Context}
+User: {Question}
+Assistant: {Response}
+User: {Question}
+Assistant:
+</pre>
 ## How to use
 ```python
     for item in enumerate(messages):
         if item['role'] == "user":
+            ## only apply this instruction for the first user turn
             item['content'] = instruction + " " + item['content']
             break
 print(tokenizer.decode(response, skip_special_tokens=True))
 ```
+## Correspondence to
 Zihan Liu (zihanl@nvidia.com), Wei Ping (wping@nvidia.com)
 ## Citation
+<pre>
+@article{liu2024chatqa,
   title={ChatQA: Building GPT-4 Level Conversational QA Models},
   author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan},
   journal={arXiv preprint arXiv:2401.10225},
+  year={2024}}
+</pre>
 ## License