aadi8anant
commited on
Commit
β’
5c9a23e
1
Parent(s):
6ad2654
Upload 2 files
Browse files- README.md +59 -12
- requirements.txt +8 -0
README.md
CHANGED
@@ -1,12 +1,59 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π€ DocBot: Smart Document ChatBot
|
2 |
+
|
3 |
+
DocBot is an intelligent document processing application with a chatbot interface. It can process various types of documents, including PDFs and images, extract essential information, and enable user interaction through a chat interface.
|
4 |
+
|
5 |
+
## βοΈ Features
|
6 |
+
|
7 |
+
- **Document Upload**: Upload PDF, PNG, JPG, or JPEG files for processing.
|
8 |
+
- **Text Extraction**: Extract text content from uploaded documents.
|
9 |
+
- **Image Processing**: Convert PDF documents to images and extract text from images.
|
10 |
+
- **Chatbot Interface**: Interact with the document through a chatbot interface powered by Groq.
|
11 |
+
- **Natural Language Understanding**: Utilizes spaCy for natural language processing.
|
12 |
+
- **Dynamic Progress Bar**: Visual feedback on document processing progress.
|
13 |
+
- **Error Handling**: Provides error messages for any processing failures.
|
14 |
+
|
15 |
+
## βοΈ Installation
|
16 |
+
|
17 |
+
1. Clone the repository:
|
18 |
+
|
19 |
+
```bash
|
20 |
+
git clone https://github.com/yourusername/docbot.git
|
21 |
+
```
|
22 |
+
|
23 |
+
2. Install the required Python packages:
|
24 |
+
|
25 |
+
```bash
|
26 |
+
pip install -r requirements.txt
|
27 |
+
```
|
28 |
+
|
29 |
+
3. Set up the environment variables:
|
30 |
+
|
31 |
+
Create a `.env` file in the root directory and add the following:
|
32 |
+
|
33 |
+
```dotenv
|
34 |
+
GROQ_API_KEY='your_groq_api_key'
|
35 |
+
```
|
36 |
+
|
37 |
+
4. Run the Streamlit app:
|
38 |
+
|
39 |
+
```bash
|
40 |
+
streamlit run app.py
|
41 |
+
```
|
42 |
+
|
43 |
+
## π Usage
|
44 |
+
|
45 |
+
1. Run the Streamlit app using the provided installation instructions.
|
46 |
+
2. Upload your document using the file uploader.
|
47 |
+
3. Wait for the document to be processed.
|
48 |
+
4. Interact with the document by asking questions in the chatbot interface.
|
49 |
+
|
50 |
+
## π» Technologies Used
|
51 |
+
|
52 |
+
- [Streamlit](https://streamlit.io/) - For building the interactive web application.
|
53 |
+
- [PyPDF2](https://pythonhosted.org/PyPDF2/) - For PDF document processing.
|
54 |
+
- [pdf2image](https://github.com/Belval/pdf2image) - For converting PDFs to images.
|
55 |
+
- [PyMuPDF](https://pypi.org/project/PyMuPDF/) - For PDF document rendering.
|
56 |
+
- [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) - For extracting text from images.
|
57 |
+
- [spaCy](https://spacy.io/) - For natural language processing.
|
58 |
+
- [Groq](https://github.com/groq/groq-py) - For AI-powered chatbot interaction.
|
59 |
+
- [Pillow](https://python-pillow.org/) - For image processing.
|
requirements.txt
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
streamlit==1.11.0
|
2 |
+
PyPDF2==1.26.0
|
3 |
+
pdf2image==1.16.0
|
4 |
+
pytesseract==0.3.9
|
5 |
+
Pillow==9.2.0
|
6 |
+
spacy==3.3.1
|
7 |
+
transformers==4.21.1
|
8 |
+
requests==2.28.1
|