Commit History

Merge pull request #14 from soumik12345/feat/ensemble-of-image-loaders
bf14736
unverified

geekyrakshit commited on

bugfix: prevent model load on every extraction
ff75fe0

mratanusarkar commited on

chore: address review points
2691833

mratanusarkar commited on

update: docs with lib sources and notes
e6f2eb8

mratanusarkar commited on

add: docs for pymupdf and fitzpil
04ea7bb

mratanusarkar commited on

add: two modules on fitz to handle img extractions
f9d44bd

mratanusarkar commited on

temp: attempt - force to png with pillow
3d948a1

mratanusarkar commited on

temp: attempt - all format img extraction from pdf
5406446

mratanusarkar commited on

add: docs for pdfplumber image loader
e19286a

mratanusarkar commited on

add: hacky impl of img extraction with pdfplumber
4fd52cf

mratanusarkar commited on

add: example usage for marker and pdf2img loaders
bf0f2e5

mratanusarkar commited on

add: marker image loader + docs + corrections
331f289

mratanusarkar commited on

chore: improve doc + code formatting
f37090a

mratanusarkar commited on

add: docs for base img loader + pdf2image
cc5cebc

mratanusarkar commited on

add: base image loader + pdf2img from load_image
5c74069

mratanusarkar commited on

Merge pull request #11 from soumik12345/feat/semantic-chunking
694a076
unverified

Atanu Sarkar commited on

add: docs for SemanticChunker
24a271d

geekyrakshit commited on

add: SemanticChunker
ace03e3

geekyrakshit commited on

add: SemanticChunker
49d583d

geekyrakshit commited on

Merge pull request #9 from soumik12345/feat/ensemble-of-text-loaders
56d3953
unverified

geekyrakshit commited on

update: gitignore + untrack uv.lock
07a16a7

mratanusarkar commited on

update: codebase addressing review comments
a24da3d

mratanusarkar commited on

update: docs with lib sources to help find kwargs
d822059

mratanusarkar commited on

add: kwargs to interact with underlying library
6526b2f

mratanusarkar commited on

fix: incorrect pypdf2 as dev dependency
d191c1b

mratanusarkar commited on

update: convert _process_page to extract_page_data
e31ec78

mratanusarkar commited on

add: docs & docstrings for marker text loader
fc27062

mratanusarkar commited on

add: marker pdf text loader
fb5095f

mratanusarkar commited on

install: marker-pdf v0.2.17
ba60fc7

mratanusarkar commited on

add: docs & docstrings for pdfplumber text loader
d647546

mratanusarkar commited on

add: pdfplumber text loader
be6fbc6

mratanusarkar commited on

install: pdfplumber v0.11.4
3494fdb

mratanusarkar commited on

add: docs & docstrings for pypdf2 text loader
419f968

mratanusarkar commited on

add: pypdf2 loader text loader
391b2f3

mratanusarkar commited on

chore: format & linting + __init__ + fix: imports
e0aff18

mratanusarkar commited on

chore: remove old load_text
78dd8e8

mratanusarkar commited on

add: docs & docstrings for base + pymupdf4llm
4304db6

mratanusarkar commited on

add: base text loader and pymupdf4llm loader
9761deb

mratanusarkar commited on

Merge pull request #4 from soumik12345/feat/colpali-retrieval
bb79bf4
unverified

geekyrakshit commited on

add: MultiModalRetriever.predict
d197e7f

geekyrakshit commited on

update: colpali index syncs with wandb artifact
abd20d0

geekyrakshit commited on

add: installation instructions
9a6c015

geekyrakshit commited on

add: installation script
c052c0a

geekyrakshit commited on

update: docuementation with installation instructions
24e7c59

geekyrakshit commited on

add: MultiModalRetriever
7df75ff

geekyrakshit commited on

update: ImageLoader
a7ff122

geekyrakshit commited on

update: ImageLoader
bd0ff68

geekyrakshit commited on

Merge pull request #3 from soumik12345/feat/image-loader
d529654
unverified

geekyrakshit commited on

Merge pull request #2 from soumik12345/feat/text-loading
d889dc6
unverified

geekyrakshit commited on