yifeihu commited on
Commit
1e326dd
·
1 Parent(s): a569455

update readme

Browse files
Files changed (1) hide show
  1. README.md +89 -0
README.md CHANGED
@@ -1,3 +1,92 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ license_link: https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/LICENSE
4
+ pipeline_tag: image-text-to-text
5
+ tags:
6
+ - vision
7
+ - ocr
8
+ - segmentation
9
  ---
10
+
11
+ # TFT-ID: Table/Figure/Text IDentifier for academic papers
12
+
13
+ ## Model Summary
14
+
15
+ TFT-ID (Table/Figure/Text IDentifier) is a family of object detection models finetuned to extract tables, figures, and text sections in academic papers created by [Yifei Hu](https://x.com/hu_yifei).
16
+
17
+ TFT-ID is finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
18
+
19
+ - The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
20
+ - TFT-ID models take an image of a single paper page as the input, and return bounding boxes for all tables, figures, and text sections in the given page.
21
+ - The text sections contain clean text content perfect for downstream OCR workflows. However, TFT-ID is not an OCR model.
22
+
23
+ ![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
24
+
25
+ Object Detection results format:
26
+ {'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
27
+ 'labels': ['label1', 'label2', ...]} }
28
+
29
+ ## Training Code and Dataset
30
+ - Dataset: Coming soon.
31
+ - Code: [github.com/ai8hyf/TF-ID](https://github.com/ai8hyf/TF-ID)
32
+
33
+ ## Benchmarks
34
+
35
+ We tested the models on paper pages outside the training dataset. The papers are a subset of huggingface daily paper.
36
+
37
+ Correct output - the model draws correct bounding boxes for every table/figure/text section in the given page and not missing any content.
38
+
39
+ | Model | Total Images | Correct Output | Success Rate |
40
+ |---------------------------------------------------------------|--------------|----------------|--------------|
41
+ | TFT-ID-1.0[[HF]](https://huggingface.co/yifeihu/TFT-ID-1.0) | 373 | 361 | 96.78% |
42
+
43
+ Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components.
44
+
45
+ ## How to Get Started with the Model
46
+
47
+ Use the code below to get started with the model.
48
+
49
+ ```python
50
+ import requests
51
+ from PIL import Image
52
+ from transformers import AutoProcessor, AutoModelForCausalLM
53
+
54
+ model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
55
+ processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
56
+
57
+ prompt = "<OD>"
58
+
59
+ url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
60
+ image = Image.open(requests.get(url, stream=True).raw)
61
+
62
+ inputs = processor(text=prompt, images=image, return_tensors="pt")
63
+
64
+ generated_ids = model.generate(
65
+ input_ids=inputs["input_ids"],
66
+ pixel_values=inputs["pixel_values"],
67
+ max_new_tokens=1024,
68
+ do_sample=False,
69
+ num_beams=3
70
+ )
71
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
72
+
73
+ parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
74
+
75
+ print(parsed_answer)
76
+
77
+ ```
78
+
79
+ To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
80
+
81
+ ## BibTex and citation info
82
+
83
+ ```
84
+ @misc{TF-ID,
85
+ author = {Yifei Hu},
86
+ title = {TF-ID: Table/Figure IDentifier for academic papers},
87
+ year = {2024},
88
+ publisher = {GitHub},
89
+ journal = {GitHub repository},
90
+ howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
91
+ }
92
+ ```