Post
756
๐๐๐๐ซ๐ฅ๐ฒ ๐๐๐ฐ ๐๐๐๐ซ ๐ซ๐๐ฅ๐๐๐ฌ๐๐ฌ๐
Hi HuggingFacers๐ค, I decided to ship early this year, and here's what I came up with:
๐๐๐๐๐ญ๐๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo ๐ https://github.com/AstraBert/PdfItDown
PyPi Package ๐ https://pypi.org/project/pdfitdown/
๐๐๐ง๐๐ซ๐๐ฏ ๐ฏ๐.๐.๐ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น performance of your ๐๐ฒ๐ ๐ ๐ฒ๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด models, I have good news for you๐ฅณ๐ฅณ
The new release for ๐๐๐ง๐๐ซ๐๐ฏ now supports ๐ฑ๐ฒ๐ป๐๐ฒ and ๐๐ฝ๐ฎ๐ฟ๐๐ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐๐ฒ๐ ๐-๐ฏ๐ฎ๐๐ฒ๐ฑ ๐ณ๐ถ๐น๐ฒ ๐ณ๐ผ๐ฟ๐บ๐ฎ๐๐ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐ฟ๐ฒ๐น๐ฒ๐๐ฎ๐ป๐ฐ๐ฒ ๐บ๐ฒ๐๐ฟ๐ถ๐ฐ๐!
GitHub repo ๐ https://github.com/AstraBert/SenTrEv
Release Notes ๐ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package ๐ https://pypi.org/project/sentrev/
Happy New Year and have fun!๐ฅ
Hi HuggingFacers๐ค, I decided to ship early this year, and here's what I came up with:
๐๐๐๐๐ญ๐๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo ๐ https://github.com/AstraBert/PdfItDown
PyPi Package ๐ https://pypi.org/project/pdfitdown/
๐๐๐ง๐๐ซ๐๐ฏ ๐ฏ๐.๐.๐ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น performance of your ๐๐ฒ๐ ๐ ๐ฒ๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด models, I have good news for you๐ฅณ๐ฅณ
The new release for ๐๐๐ง๐๐ซ๐๐ฏ now supports ๐ฑ๐ฒ๐ป๐๐ฒ and ๐๐ฝ๐ฎ๐ฟ๐๐ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐๐ฒ๐ ๐-๐ฏ๐ฎ๐๐ฒ๐ฑ ๐ณ๐ถ๐น๐ฒ ๐ณ๐ผ๐ฟ๐บ๐ฎ๐๐ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐ฟ๐ฒ๐น๐ฒ๐๐ฎ๐ป๐ฐ๐ฒ ๐บ๐ฒ๐๐ฟ๐ถ๐ฐ๐!
GitHub repo ๐ https://github.com/AstraBert/SenTrEv
Release Notes ๐ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package ๐ https://pypi.org/project/sentrev/
Happy New Year and have fun!๐ฅ