Spaces:
Sleeping
title: DocuChatDeepSeek
emoji: ⚡
colorFrom: yellow
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
short_description: Deepseek-DocuChat – Simple, intuitive, and descriptive.
📄 DocuChat - AI-Powered RAG Chatbot DocuChat is a Retrieval-Augmented Generation (RAG) chatbot powered by DeepSeek and built with Streamlit. It allows users to upload documents (PDF, Word, Markdown) or provide a web link, process the content, and ask questions about it. The application uses semantic embeddings and a FAISS vector database for efficient retrieval and question-answering.
🚀 Features Document Upload: Upload PDF, Word (.docx), or Markdown (.md) files.
Web Link Support: Provide a web link to extract and process content.
Semantic Search: Generate embeddings using sentence-transformers for semantic understanding.
Efficient Retrieval: Store embeddings in a FAISS vector database for fast and accurate querying.
Question-Answering: Use DeepSeek API for intelligent question-answering capabilities.
User-Friendly Interface: Built with Streamlit for an interactive and intuitive UI.
🛠️ Installation Clone the Repository:
git clone https://github.com/your-username/DocuChat.git cd DocuChat
Install Dependencies: Make sure you have Python 3.8+ installed. Then, install the required packages:
pip install -r requirements.txt Set Up DeepSeek API Key:
Obtain your API key from DeepSeek.
Add the API key in the Streamlit app when prompted.
🖥️ Usage Run the Application:
streamlit run app.py Input Your DeepSeek API Key:
Enter your API key in the provided field.
Upload a Document or Enter a Web Link:
Choose between uploading a document (PDF, Word, or Markdown) or providing a web link.
Ask Questions:
Once the document is processed, ask questions about its content.
🧩 How It Works Document Processing:
The uploaded document or web content is split into smaller chunks for efficient processing.
Semantic embeddings are generated using sentence-transformers.
Vector Database:
Embeddings are stored in a FAISS vector database for fast and accurate retrieval.
Question-Answering:
When a user asks a question, the app retrieves the most relevant chunks from the vector database.
The DeepSeek API generates a response based on the retrieved information.
📂 File Structure Copy DocuChat/ ├── app.py # Main Streamlit application ├── requirements.txt # List of dependencies ├── README.md # Project documentation └── .gitignore # Files to ignore in Git
📝 Requirements Python 3.8+ Streamlit LangChain FAISS Sentence-Transformers PyPDF Docx2txt Unstructured (for Markdown files) WebBaseLoader (for web links)
🔧 Dependencies Install all dependencies using: pip install -r requirements.txt
🌟 Why DocuChat? Efficient: Processes documents once and retrieves answers quickly.
Versatile: Supports multiple file types and web links.
Intelligent: Uses state-of-the-art AI models for semantic understanding and question-answering.
User-Friendly: Simple and intuitive interface powered by Streamlit.
🤝 Contributing Contributions are welcome! If you'd like to contribute, please follow these steps:
Fork the repository.
Create a new branch (git checkout -b feature/YourFeatureName).
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/YourFeatureName).
Open a pull request.
📜 License This project is licensed under the MIT License. See the LICENSE file for details.
🙏 Acknowledgments DeepSeek for providing the question-answering API.
LangChain for the document processing and retrieval framework.
Streamlit for the interactive UI framework.
Sentence-Transformers for semantic embeddings.
📧 Contact For questions or feedback, feel free to reach out:
GitHub - https://github.com/schalise
Enjoy using DocuChat! 🎉 Let your documents speak for themselves. 🗣️