aadi8anant commited on
Commit
5c9a23e
β€’
1 Parent(s): 6ad2654

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +59 -12
  2. requirements.txt +8 -0
README.md CHANGED
@@ -1,12 +1,59 @@
1
- ---
2
- title: DocBot
3
- emoji: 🐒
4
- colorFrom: indigo
5
- colorTo: pink
6
- sdk: streamlit
7
- sdk_version: 1.35.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ€– DocBot: Smart Document ChatBot
2
+
3
+ DocBot is an intelligent document processing application with a chatbot interface. It can process various types of documents, including PDFs and images, extract essential information, and enable user interaction through a chat interface.
4
+
5
+ ## ⭐️ Features
6
+
7
+ - **Document Upload**: Upload PDF, PNG, JPG, or JPEG files for processing.
8
+ - **Text Extraction**: Extract text content from uploaded documents.
9
+ - **Image Processing**: Convert PDF documents to images and extract text from images.
10
+ - **Chatbot Interface**: Interact with the document through a chatbot interface powered by Groq.
11
+ - **Natural Language Understanding**: Utilizes spaCy for natural language processing.
12
+ - **Dynamic Progress Bar**: Visual feedback on document processing progress.
13
+ - **Error Handling**: Provides error messages for any processing failures.
14
+
15
+ ## βš™οΈ Installation
16
+
17
+ 1. Clone the repository:
18
+
19
+ ```bash
20
+ git clone https://github.com/yourusername/docbot.git
21
+ ```
22
+
23
+ 2. Install the required Python packages:
24
+
25
+ ```bash
26
+ pip install -r requirements.txt
27
+ ```
28
+
29
+ 3. Set up the environment variables:
30
+
31
+ Create a `.env` file in the root directory and add the following:
32
+
33
+ ```dotenv
34
+ GROQ_API_KEY='your_groq_api_key'
35
+ ```
36
+
37
+ 4. Run the Streamlit app:
38
+
39
+ ```bash
40
+ streamlit run app.py
41
+ ```
42
+
43
+ ## πŸš€ Usage
44
+
45
+ 1. Run the Streamlit app using the provided installation instructions.
46
+ 2. Upload your document using the file uploader.
47
+ 3. Wait for the document to be processed.
48
+ 4. Interact with the document by asking questions in the chatbot interface.
49
+
50
+ ## πŸ’» Technologies Used
51
+
52
+ - [Streamlit](https://streamlit.io/) - For building the interactive web application.
53
+ - [PyPDF2](https://pythonhosted.org/PyPDF2/) - For PDF document processing.
54
+ - [pdf2image](https://github.com/Belval/pdf2image) - For converting PDFs to images.
55
+ - [PyMuPDF](https://pypi.org/project/PyMuPDF/) - For PDF document rendering.
56
+ - [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) - For extracting text from images.
57
+ - [spaCy](https://spacy.io/) - For natural language processing.
58
+ - [Groq](https://github.com/groq/groq-py) - For AI-powered chatbot interaction.
59
+ - [Pillow](https://python-pillow.org/) - For image processing.
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ streamlit==1.11.0
2
+ PyPDF2==1.26.0
3
+ pdf2image==1.16.0
4
+ pytesseract==0.3.9
5
+ Pillow==9.2.0
6
+ spacy==3.3.1
7
+ transformers==4.21.1
8
+ requests==2.28.1