writinwaters
commited on
Commit
·
8b7269c
1
Parent(s):
22fe41e
Updated RAGFlow UI (#3362)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- docker/README.md +8 -2
- docs/configurations.md +8 -2
- web/src/locales/en.ts +25 -34
docker/README.md
CHANGED
@@ -102,13 +102,19 @@ The [.env](./.env) file contains important environment variables for Docker.
|
|
102 |
> - `RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:dev` or,
|
103 |
> - `RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:dev`.
|
104 |
|
105 |
-
###
|
106 |
|
107 |
- `TIMEZONE`
|
108 |
The local time zone. Defaults to `'Asia/Shanghai'`.
|
|
|
|
|
|
|
109 |
- `HF_ENDPOINT`
|
110 |
The mirror site for huggingface.co. It is disabled by default. You can uncomment this line if you have limited access to the primary Hugging Face domain.
|
111 |
-
|
|
|
|
|
|
|
112 |
Optimizations for MacOS. It is disabled by default. You can uncomment this line if your OS is MacOS.
|
113 |
|
114 |
## 🐋 Service configuration
|
|
|
102 |
> - `RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:dev` or,
|
103 |
> - `RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:dev`.
|
104 |
|
105 |
+
### Timezone
|
106 |
|
107 |
- `TIMEZONE`
|
108 |
The local time zone. Defaults to `'Asia/Shanghai'`.
|
109 |
+
|
110 |
+
### Hugging Face mirror site
|
111 |
+
|
112 |
- `HF_ENDPOINT`
|
113 |
The mirror site for huggingface.co. It is disabled by default. You can uncomment this line if you have limited access to the primary Hugging Face domain.
|
114 |
+
|
115 |
+
### MacOS
|
116 |
+
|
117 |
+
- `MACOS`
|
118 |
Optimizations for MacOS. It is disabled by default. You can uncomment this line if your OS is MacOS.
|
119 |
|
120 |
## 🐋 Service configuration
|
docs/configurations.md
CHANGED
@@ -123,13 +123,19 @@ If you cannot download the RAGFlow Docker image, try the following mirrors.
|
|
123 |
- `RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:dev`.
|
124 |
:::
|
125 |
|
126 |
-
###
|
127 |
|
128 |
- `TIMEZONE`
|
129 |
The local time zone. Defaults to `'Asia/Shanghai'`.
|
|
|
|
|
|
|
130 |
- `HF_ENDPOINT`
|
131 |
The mirror site for huggingface.co. It is disabled by default. You can uncomment this line if you have limited access to the primary Hugging Face domain.
|
132 |
-
|
|
|
|
|
|
|
133 |
Optimizations for MacOS. It is disabled by default. You can uncomment this line if your OS is MacOS.
|
134 |
|
135 |
## Service configuration
|
|
|
123 |
- `RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:dev`.
|
124 |
:::
|
125 |
|
126 |
+
### Timezone
|
127 |
|
128 |
- `TIMEZONE`
|
129 |
The local time zone. Defaults to `'Asia/Shanghai'`.
|
130 |
+
|
131 |
+
### Hugging Face mirror site
|
132 |
+
|
133 |
- `HF_ENDPOINT`
|
134 |
The mirror site for huggingface.co. It is disabled by default. You can uncomment this line if you have limited access to the primary Hugging Face domain.
|
135 |
+
|
136 |
+
### MacOS
|
137 |
+
|
138 |
+
- `MACOS`
|
139 |
Optimizations for MacOS. It is disabled by default. You can uncomment this line if your OS is MacOS.
|
140 |
|
141 |
## Service configuration
|
web/src/locales/en.ts
CHANGED
@@ -200,43 +200,39 @@ export default {
|
|
200 |
methodEmpty:
|
201 |
'This will display a visual explanation of the knowledge base categories',
|
202 |
book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
|
203 |
-
|
204 |
-
please setup the <i>page ranges</i> for every book in order eliminate negative effects and save computing time for analyzing.</p>`,
|
205 |
laws: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
|
206 |
-
Legal documents
|
207 |
</p><p>
|
208 |
-
The chunk granularity
|
209 |
</p>`,
|
210 |
manual: `<p>Only <b>PDF</b> is supported.</p><p>
|
211 |
We assume that the manual has a hierarchical section structure, using the lowest section titles as basic unit for chunking documents. Therefore, figures and tables in the same section will not be separated, which may result in larger chunk sizes.
|
212 |
</p>`,
|
213 |
naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p>
|
214 |
-
<p>This method
|
215 |
<p>
|
216 |
-
<li>
|
217 |
-
<li>
|
218 |
paper: `<p>Only <b>PDF</b> file is supported.</p><p>
|
219 |
-
|
220 |
-
|
221 |
-
|
222 |
-
|
223 |
-
|
224 |
-
|
225 |
-
Every page will be treated as a chunk. And the thumbnail of every page will be stored.</p><p>
|
226 |
-
<i>All the PPT files you uploaded will be chunked by using this method automatically, setting-up for every PPT file is not necessary.</i></p>`,
|
227 |
qa: `
|
228 |
<p>
|
229 |
This chunk method supports <b>EXCEL</b> and <b>CSV/TXT</b> file formats.
|
230 |
</p>
|
231 |
<li>
|
232 |
-
If
|
233 |
without headers: one for questions and the other for answers, with the
|
234 |
question column preceding the answer column. Multiple sheets are
|
235 |
-
acceptable
|
236 |
</li>
|
237 |
<li>
|
238 |
-
If
|
239 |
-
used as the delimiter to separate questions and answers.
|
240 |
</li>
|
241 |
<p>
|
242 |
<i>
|
@@ -245,25 +241,20 @@ export default {
|
|
245 |
</i>
|
246 |
</p>
|
247 |
`,
|
248 |
-
resume: `<p>
|
249 |
</p><p>
|
250 |
-
|
251 |
-
</p><p>
|
252 |
-
Instead of chunking the résumé, we parse the résumé into structured data. As a HR, you can dump all the résumé you have,
|
253 |
-
the you can list all the candidates that match the qualifications just by talk with <i>'RAGFlow'</i>.
|
254 |
</p>
|
255 |
`,
|
256 |
-
table: `<p
|
257 |
-
Here're some tips:
|
258 |
<ul>
|
259 |
-
<li>For
|
260 |
-
<li>The first
|
261 |
-
<li>Column headers must be meaningful terms
|
262 |
-
It
|
263 |
-
|
264 |
-
|
265 |
-
<li>supplier/vendor<b>'TAB'</b>color(yellow, red, brown)<b>'TAB'</b>gender/sex(male, female)<b>'TAB'</b>size(M,L,XL,XXL)</li>
|
266 |
-
<li>姓名/名字<b>'TAB'</b>电话/手机/微信<b>'TAB'</b>最高学历(高中,职高,硕士,本科,博士,初中,中技,中专,专科,专升本,MPA,MBA,EMBA)</li>
|
267 |
</ol>
|
268 |
</p>
|
269 |
</li>
|
|
|
200 |
methodEmpty:
|
201 |
'This will display a visual explanation of the knowledge base categories',
|
202 |
book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
|
203 |
+
For each book in PDF, please set the <i>page ranges</i> to remove unwanted information and reduce analysis time.</p>`,
|
|
|
204 |
laws: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
|
205 |
+
Legal documents typically follow a rigorous writing format. We use text feature to identify split point.
|
206 |
</p><p>
|
207 |
+
The chunk has a granularity consistent with 'ARTICLE', ensuring all upper level text is included in the chunk.
|
208 |
</p>`,
|
209 |
manual: `<p>Only <b>PDF</b> is supported.</p><p>
|
210 |
We assume that the manual has a hierarchical section structure, using the lowest section titles as basic unit for chunking documents. Therefore, figures and tables in the same section will not be separated, which may result in larger chunk sizes.
|
211 |
</p>`,
|
212 |
naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p>
|
213 |
+
<p>This method chunks files using the 'naive' way: </p>
|
214 |
<p>
|
215 |
+
<li>Use vision detection model to split the texts into smaller segments.</li>
|
216 |
+
<li>Then, combine adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</li></p>`,
|
217 |
paper: `<p>Only <b>PDF</b> file is supported.</p><p>
|
218 |
+
Papers will be split by section, such as <i>abstract, 1.1, 1.2</i>. </p><p>
|
219 |
+
This approach enables the LLM to summarize the paper more effectively and provide more comprehensive, understandable responses.
|
220 |
+
However, it also increases the context for AI conversations and adds to the computational cost for the LLM. So during a conversation, consider reducing the value of ‘<b>topN</b>’.</p>`,
|
221 |
+
presentation: `<p>Supported file formats are <b>PDF</b>, <b>PPTX</b>.</p><p>
|
222 |
+
Every page in the slides is treated as a chunk, with its thumbnail image stored.</p><p>
|
223 |
+
<i>This chunk method is automatically applied to all uploaded PPT files, so you do not need to specify it manually.</i></p>`,
|
|
|
|
|
224 |
qa: `
|
225 |
<p>
|
226 |
This chunk method supports <b>EXCEL</b> and <b>CSV/TXT</b> file formats.
|
227 |
</p>
|
228 |
<li>
|
229 |
+
If a file is in <b>Excel</b> format, it should contain two columns
|
230 |
without headers: one for questions and the other for answers, with the
|
231 |
question column preceding the answer column. Multiple sheets are
|
232 |
+
acceptable, provided the columns are properly structured.
|
233 |
</li>
|
234 |
<li>
|
235 |
+
If a file is in <b>CSV/TXT</b> format, it must be UTF-8 encoded with TAB as the delimiter to separate questions and answers.
|
|
|
236 |
</li>
|
237 |
<p>
|
238 |
<i>
|
|
|
241 |
</i>
|
242 |
</p>
|
243 |
`,
|
244 |
+
resume: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.
|
245 |
</p><p>
|
246 |
+
Résumés of various forms are parsed and organized into structured data to facilitate candidate search for recruiters.
|
|
|
|
|
|
|
247 |
</p>
|
248 |
`,
|
249 |
+
table: `<p>Supported file formats are <b>EXCEL</b> and <b>CSV/TXT</b>.</p><p>
|
250 |
+
Here're some prerequisites and tips:
|
251 |
<ul>
|
252 |
+
<li>For CSV or TXT file, the delimiter between columns must be <em><b>TAB</b></em>.</li>
|
253 |
+
<li>The first row must be column headers.</li>
|
254 |
+
<li>Column headers must be meaningful terms to aid your LLM's understanding.
|
255 |
+
It is good practice to juxtapose synonyms separated by a slash <i>'/'</i> and to enumerate values using brackets, for example: <i>'Gender/Sex (male, female)'</i>.<p>
|
256 |
+
Here are some examples of headers:<ol>
|
257 |
+
<li>supplier/vendor<b>'TAB'</b>Color (Yellow, Blue, Brown)<b>'TAB'</b>Sex/Gender (male, female)<b>'TAB'</b>size (M, L, XL, XXL)</li>
|
|
|
|
|
258 |
</ol>
|
259 |
</p>
|
260 |
</li>
|