PyPDF2 scikit-learn transformers PyMuPDF pytesseract pillow