README.md · chalisesagun/DocuChatDeepSeek at 946c19c08e63bc79d94114587af6a1c86be90d73

metadata

title: DocuChatDeepSeek
emoji: ⚡
colorFrom: yellow
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
short_description: Deepseek-DocuChat – Simple, intuitive, and descriptive.

📄 DocuChat - AI-Powered RAG Chatbot DocuChat is a Retrieval-Augmented Generation (RAG) chatbot powered by DeepSeek and built with Streamlit. It allows users to upload documents (PDF, Word, Markdown) or provide a web link, process the content, and ask questions about it. The application uses semantic embeddings and a FAISS vector database for efficient retrieval and question-answering.

🚀 Features Document Upload: Upload PDF, Word (.docx), or Markdown (.md) files.

Web Link Support: Provide a web link to extract and process content.

Semantic Search: Generate embeddings using sentence-transformers for semantic understanding.

Efficient Retrieval: Store embeddings in a FAISS vector database for fast and accurate querying.

Question-Answering: Use DeepSeek API for intelligent question-answering capabilities.

User-Friendly Interface: Built with Streamlit for an interactive and intuitive UI.

🛠️ Installation Clone the Repository:

git clone https://github.com/your-username/DocuChat.git cd DocuChat

Install Dependencies: Make sure you have Python 3.8+ installed. Then, install the required packages:

pip install -r requirements.txt Set Up DeepSeek API Key:

Obtain your API key from DeepSeek.

Add the API key in the Streamlit app when prompted.

🖥️ Usage Run the Application:

streamlit run app.py Input Your DeepSeek API Key:

Enter your API key in the provided field.

Upload a Document or Enter a Web Link:

Choose between uploading a document (PDF, Word, or Markdown) or providing a web link.

Ask Questions:

Once the document is processed, ask questions about its content.

🧩 How It Works Document Processing:

The uploaded document or web content is split into smaller chunks for efficient processing.

Semantic embeddings are generated using sentence-transformers.

Vector Database:

Embeddings are stored in a FAISS vector database for fast and accurate retrieval.

Question-Answering:

When a user asks a question, the app retrieves the most relevant chunks from the vector database.

The DeepSeek API generates a response based on the retrieved information.

📂 File Structure Copy DocuChat/ ├── app.py # Main Streamlit application ├── requirements.txt # List of dependencies ├── README.md # Project documentation └── .gitignore # Files to ignore in Git

📝 Requirements Python 3.8+ Streamlit LangChain FAISS Sentence-Transformers PyPDF Docx2txt Unstructured (for Markdown files) WebBaseLoader (for web links)

🔧 Dependencies Install all dependencies using: pip install -r requirements.txt

🌟 Why DocuChat? Efficient: Processes documents once and retrieves answers quickly.

Versatile: Supports multiple file types and web links.

Intelligent: Uses state-of-the-art AI models for semantic understanding and question-answering.

User-Friendly: Simple and intuitive interface powered by Streamlit.

🤝 Contributing Contributions are welcome! If you'd like to contribute, please follow these steps:

Fork the repository.

Create a new branch (git checkout -b feature/YourFeatureName).

Commit your changes (git commit -m 'Add some feature').

Push to the branch (git push origin feature/YourFeatureName).

Open a pull request.

📜 License This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments DeepSeek for providing the question-answering API.

LangChain for the document processing and retrieval framework.

Streamlit for the interactive UI framework.

Sentence-Transformers for semantic embeddings.

📧 Contact For questions or feedback, feel free to reach out:

sagunchalise@gmail.com

GitHub - https://github.com/schalise

Enjoy using DocuChat! 🎉 Let your documents speak for themselves. 🗣️