langchain openai streamlit pinecone-client chromadb unstructured pdf2image pytesseract tiktoken