Extracting Tables from PDFs without Extra Content: Using Table Transformer for Accurate Recognition

#1
by ashanq - opened

I have uploaded an image that contains extra content along with the table. However, it extracts other text from the PDF file as well. I think you should add a Table Transformer to first recognize the table, and then you can extract the table into HTML or any other format.
Screenshot 2024-12-25 165256.png

ashanq changed discussion title from Extracting Tables from PDFs with Extra Content: Using Table Transformer for Accurate Recognition to Extracting Tables from PDFs without Extra Content: Using Table Transformer for Accurate Recognition

actually, there is a repo https://huggingface.co/spaces/Joker1212/RapidTableDetection for table extraction

for pdf, we also has layout repo https://github.com/RapidAI/RapidLayout with many onnx model,easy to use

Sign up or log in to comment