openai langchain beautifulsoup4 chromadb tiktoken pypdf gradio PyMuPDF gdown docx2txt sentence-transformers ibm-watson-machine-learning ibm-generative-ai html5lib lxml # unstructured[all-docs]