ali6parmak commited on
Commit
2aa7f88
·
verified ·
1 Parent(s): 6f639a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: openrail
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ ---
4
+
5
+
6
+ <h3 align="center">PDF Document Layout Analysis</h3>
7
+ <p align="center">Models for extracting segments alongside with their types from a PDF</p>
8
+
9
+ In this model card, we are providing the non-visual models we use in our pdf-document-layout-analysis service:
10
+
11
+ https://github.com/huridocs/pdf-document-layout-analysis
12
+
13
+ This service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on. Additionally, it determines the correct order of these identified elements.
14
+
15
+
16
+ ## Quick Start
17
+
18
+ Clone the service:
19
+
20
+ git clone https://github.com/huridocs/pdf-document-layout-analysis.git
21
+ cd pdf-document-layout-analysis
22
+
23
+ Start the service:
24
+
25
+ # With GPU support:
26
+ make start
27
+
28
+ # Without GPU support [Recommended if you do not have a GPU on your system]
29
+ make start_no_gpu
30
+
31
+
32
+ Get the segments of a PDF:
33
+
34
+ # With visual models
35
+ curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
36
+
37
+ # With non-visual models [with the models in this model card]
38
+ curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060/fast
39
+
40
+
41
+ When the process is done, the output will include a list of SegmentBox elements and, every SegmentBox element will has this information:
42
+
43
+ {
44
+ "left": Left position of the segment
45
+ "top": Top position of the segment
46
+ "width": Width of the segment
47
+ "height": Height of the segment
48
+ "page_number": Page number which the segment belongs to
49
+ "text": Text inside the segment
50
+ "type": Type of the segment
51
+ }
52
+
53
+
54
+ To stop the server:
55
+
56
+ make stop
57
+
58
+
59
+ For more information, you can refer to:
60
+
61
+ https://github.com/huridocs/pdf-document-layout-analysis
62
+