Spaces:
Runtime error
Runtime error
parambharat
commited on
Commit
•
85bfd70
1
Parent(s):
049ff35
chore: fix citations and response format
Browse files- rag/rag.py +14 -11
rag/rag.py
CHANGED
@@ -29,11 +29,7 @@ Here are the relevant snippets from the Llama 3 405B model research paper:
|
|
29 |
{context_str}
|
30 |
</snippets>
|
31 |
|
32 |
-
|
33 |
-
{query_str}
|
34 |
-
</question>
|
35 |
-
|
36 |
-
To answer this question:
|
37 |
|
38 |
1. Carefully read and analyze the provided snippets.
|
39 |
2. Identify information that is directly relevant to the user's question.
|
@@ -50,11 +46,14 @@ Guidelines for your answer:
|
|
50 |
6. Cite the relevant sentences from the snippets and their page numbers to support your answer.
|
51 |
7. Answer in MFAQ format (Minimal Facts Answerable Question), providing the most concise and accurate response possible.
|
52 |
8. Use Markdown to format your response and include citations to indicate the snippets and the page number used to derive your answer.
|
|
|
53 |
|
54 |
Here's an example of a question and an answer. You must use this as a template to format your response:
|
55 |
|
56 |
<example>
|
57 |
-
|
|
|
|
|
58 |
|
59 |
### Answer
|
60 |
The main mix of the training data for the Llama 3 405 billion parameter model is as follows:
|
@@ -66,16 +65,20 @@ The main mix of the training data for the Llama 3 405 billion parameter model is
|
|
66 |
|
67 |
Regarding the amount of data used to train the model, the snippets do not provide a specific total volume of data in terms of tokens or bytes. However, they do mention that the model was pre-trained on a large dataset containing knowledge until the end of 2023[^2^]. Additionally, the training process involved pre-training on 2.87 trillion tokens before further adjustments[^3^].
|
68 |
|
69 |
-
###
|
70 |
|
71 |
-
[^1^]: "Scaling Laws for Data Mix," page 6.
|
72 |
-
[^2^]: "Pre-Training Data," page 4.
|
73 |
-
[^3^]: "Initial Pre-Training," page 14.
|
74 |
|
75 |
</example>
|
76 |
|
77 |
Remember, your role is to accurately convey the information from the research paper snippets, not to speculate or provide information from other sources.
|
78 |
|
|
|
|
|
|
|
|
|
79 |
Answer:
|
80 |
"""
|
81 |
|
@@ -113,7 +116,7 @@ class SimpleRAGPipeline(weave.Model):
|
|
113 |
nodes,
|
114 |
embed_model=self._get_embedding_model(),
|
115 |
show_progress=True,
|
116 |
-
insert_batch_size=
|
117 |
)
|
118 |
|
119 |
return index
|
|
|
29 |
{context_str}
|
30 |
</snippets>
|
31 |
|
32 |
+
To answer the question:
|
|
|
|
|
|
|
|
|
33 |
|
34 |
1. Carefully read and analyze the provided snippets.
|
35 |
2. Identify information that is directly relevant to the user's question.
|
|
|
46 |
6. Cite the relevant sentences from the snippets and their page numbers to support your answer.
|
47 |
7. Answer in MFAQ format (Minimal Facts Answerable Question), providing the most concise and accurate response possible.
|
48 |
8. Use Markdown to format your response and include citations to indicate the snippets and the page number used to derive your answer.
|
49 |
+
9. Your answer must only have two headings: 'Answer' and 'Citations'.
|
50 |
|
51 |
Here's an example of a question and an answer. You must use this as a template to format your response:
|
52 |
|
53 |
<example>
|
54 |
+
<question>
|
55 |
+
What was the main mix of the training data ? How much data was used to train the model ?
|
56 |
+
</question>
|
57 |
|
58 |
### Answer
|
59 |
The main mix of the training data for the Llama 3 405 billion parameter model is as follows:
|
|
|
65 |
|
66 |
Regarding the amount of data used to train the model, the snippets do not provide a specific total volume of data in terms of tokens or bytes. However, they do mention that the model was pre-trained on a large dataset containing knowledge until the end of 2023[^2^]. Additionally, the training process involved pre-training on 2.87 trillion tokens before further adjustments[^3^].
|
67 |
|
68 |
+
### Citations
|
69 |
|
70 |
+
- [^1^]: "Scaling Laws for Data Mix," page 6.
|
71 |
+
- [^2^]: "Pre-Training Data," page 4.
|
72 |
+
- [^3^]: "Initial Pre-Training," page 14.
|
73 |
|
74 |
</example>
|
75 |
|
76 |
Remember, your role is to accurately convey the information from the research paper snippets, not to speculate or provide information from other sources.
|
77 |
|
78 |
+
<question>
|
79 |
+
{query_str}
|
80 |
+
</question>
|
81 |
+
|
82 |
Answer:
|
83 |
"""
|
84 |
|
|
|
116 |
nodes,
|
117 |
embed_model=self._get_embedding_model(),
|
118 |
show_progress=True,
|
119 |
+
insert_batch_size=512,
|
120 |
)
|
121 |
|
122 |
return index
|