Spaces:
Sleeping
Sleeping
Carlos Salgado
commited on
Commit
•
8c7984b
1
Parent(s):
2755970
update README
Browse files
README.md
CHANGED
@@ -1,7 +1,6 @@
|
|
1 |
-
|
2 |
---
|
3 |
title: DocVerifyRAG
|
4 |
-
emoji:
|
5 |
colorFrom: pink
|
6 |
colorTo: green
|
7 |
sdk: streamlit
|
@@ -11,89 +10,75 @@ pinned: false
|
|
11 |
---
|
12 |
|
13 |
<!-- PROJECT TITLE -->
|
14 |
-
<h1 align="center">DocVerifyRAG:
|
15 |
<div id="header" align="center">
|
16 |
</div>
|
17 |
<h2 align="center">
|
18 |
Description
|
19 |
</h2>
|
20 |
-
<p align="center"> DocVerifyRAG
|
21 |
|
22 |
## Table of Contents
|
23 |
|
24 |
<details>
|
25 |
<summary>DocVerifyRAG</summary>
|
26 |
|
27 |
-
- [Application Description](#application-description)
|
28 |
- [Table of Contents](#table-of-contents)
|
29 |
-
- [
|
30 |
-
- [
|
31 |
-
- [
|
32 |
-
- [
|
|
|
|
|
33 |
- [Authors](#authors)
|
34 |
- [License](#license)
|
35 |
|
36 |
</details>
|
37 |
|
38 |
## TRY the prototype
|
39 |
-
[DocVerifyRAG](https://
|
40 |
|
41 |
## Screenshots
|
42 |
-
|
43 |
-
[Add screenshots here]
|
44 |
|
45 |
## Technology Stack
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
-
| Technology | Description |
|
48 |
-
| ---------- | --------------------------- |
|
49 |
-
| AI/ML | Artificial Intelligence and Machine Learning |
|
50 |
-
| Python | Programming Language |
|
51 |
-
| Flask | Web Framework |
|
52 |
-
| Docker | Containerization |
|
53 |
-
| Tech Name | Short description |
|
54 |
|
55 |
### Features
|
56 |
|
57 |
-
1. **
|
58 |
-
|
59 |
-
|
60 |
|
61 |
-
2. **
|
62 |
-
|
63 |
-
|
64 |
|
65 |
-
3. **
|
66 |
-
|
67 |
-
|
|
|
68 |
|
69 |
-
|
|
|
|
|
70 |
|
71 |
-
|
72 |
|
73 |
1. Clone the repository:
|
74 |
```bash
|
75 |
-
$ git clone https://github.com/
|
76 |
-
```
|
77 |
-
|
78 |
-
2. Navigate to the frontend directory:
|
79 |
-
```bash
|
80 |
-
$ cd DocVerifyRAG/frontend
|
81 |
-
```
|
82 |
-
|
83 |
-
3. Install dependencies:
|
84 |
-
```bash
|
85 |
-
$ npm install
|
86 |
-
```
|
87 |
-
4. Run project:
|
88 |
-
```bash
|
89 |
-
$ npm run dev
|
90 |
-
```
|
91 |
-
|
92 |
-
#### Step 2 - Backend
|
93 |
-
|
94 |
-
1. Navigate to the backend directory:
|
95 |
-
```bash
|
96 |
-
$ cd DocVerifyRAG/backend
|
97 |
```
|
98 |
|
99 |
2. Install dependencies:
|
@@ -101,25 +86,14 @@ pinned: false
|
|
101 |
$ pip install -r requirements.txt
|
102 |
```
|
103 |
|
104 |
-
|
105 |
-
|
106 |
-
To deploy DocVerifyRAG using Docker, follow these steps:
|
107 |
-
|
108 |
-
1. Pull the Docker image from Docker Hub:
|
109 |
-
|
110 |
-
```bash
|
111 |
-
$ docker pull sandra/docverifyrag:latest
|
112 |
-
```
|
113 |
-
|
114 |
-
2. Run the Docker container:
|
115 |
-
|
116 |
```bash
|
117 |
-
$
|
118 |
```
|
119 |
|
120 |
### Usage
|
121 |
|
122 |
-
Access the web interface and follow the prompts to upload documents, classify them, and verify metadata. The AI-powered anomaly detection system will automatically flag any discrepancies or errors in the document metadata, providing accurate and reliable document management solutions
|
123 |
## Authors
|
124 |
|
125 |
| Name | Link |
|
@@ -127,9 +101,8 @@ Access the web interface and follow the prompts to upload documents, classify th
|
|
127 |
| Sandra Ashipala | [GitHub](https://github.com/sandramsc) |
|
128 |
| Elia Wäfler | [GitHub](https://github.com/eliawaefler) |
|
129 |
| Carlos Salgado | [GitHub](https://github.com/salgadev) |
|
130 |
-
| Abdul Qadeer | [GitHub](https://github.com/AbdulQadeer-55) |
|
131 |
|
132 |
|
133 |
## License
|
134 |
|
135 |
-
[![GitLicense](https://img.shields.io/badge/License-MIT-lime.svg)](https://github.com/eliawaefler/DocVerifyRAG/blob/main/LICENSE)
|
|
|
|
|
1 |
---
|
2 |
title: DocVerifyRAG
|
3 |
+
emoji: 🖺
|
4 |
colorFrom: pink
|
5 |
colorTo: green
|
6 |
sdk: streamlit
|
|
|
10 |
---
|
11 |
|
12 |
<!-- PROJECT TITLE -->
|
13 |
+
<h1 align="center">DocVerifyRAG: Anomaly detection for BIM document metadata</h1>
|
14 |
<div id="header" align="center">
|
15 |
</div>
|
16 |
<h2 align="center">
|
17 |
Description
|
18 |
</h2>
|
19 |
+
<p align="center"> Introducing DocVerifyRAG, a cutting-edge solution revolutionizing document verification processes across various sectors. Our app goes beyond mere document classification; it focuses on ensuring metadata accuracy by cross-referencing against a vast vector database of exemplary cases. Inspired by the necessity for precise data management, DocVerifyRAG leverages AI to scrutinize document metadata, instantly flagging anomalies and offering suggested corrections. Powered by Vectara vector store technology and supported by the innovative capabilities of together.ai API, our app employs advanced anomaly detection algorithms to scrutinize metadata, ensuring compliance with regulatory standards and enhancing data integrity. With DocVerifyRAG, users can effortlessly verify document metadata accuracy, minimizing errors and streamlining operational efficiency.</p>
|
20 |
|
21 |
## Table of Contents
|
22 |
|
23 |
<details>
|
24 |
<summary>DocVerifyRAG</summary>
|
25 |
|
|
|
26 |
- [Table of Contents](#table-of-contents)
|
27 |
+
- [TRY the prototype](#try-the-prototype)
|
28 |
+
- [Screenshots](#screenshots)
|
29 |
+
- [Technology Stack](#technology-stack)
|
30 |
+
- [Features](#features)
|
31 |
+
- [Install locally](#install-locally)
|
32 |
+
- [Usage](#usage)
|
33 |
- [Authors](#authors)
|
34 |
- [License](#license)
|
35 |
|
36 |
</details>
|
37 |
|
38 |
## TRY the prototype
|
39 |
+
[DocVerifyRAG](https://docverifyrag.vercel.app/)
|
40 |
|
41 |
## Screenshots
|
42 |
+
![ttthh](https://github.com/eliawaefler/DocVerifyRAG/assets/19821445/331845d7-a360-4315-92ef-d4bb50021eaa)
|
|
|
43 |
|
44 |
## Technology Stack
|
45 |
+
| Technology | Description |
|
46 |
+
| --- | --- |
|
47 |
+
| **Python** | Primary programming language used for development. |
|
48 |
+
| **LangChain** | Framework for developing applications powered by large language models (LLMs). |
|
49 |
+
| **Vectara** | Provides efficient vector search capabilities via the Boomerang model in a "RAG as a service" architecture. |
|
50 |
+
| **intfloat/multilingual-e5-large** | Generates efficient and performant multilingual language embeddings. |
|
51 |
+
| **Together AI** | Platform for training, fine-tuning, and deploying gen AI models. Its inference API was used with the model `mistralai/Mixtral-8x7B-Instruct-v0.1`. |
|
52 |
+
| **Streamlit** | Open-source Python library for creating custom web apps, used as the frontend. |
|
53 |
+
| **Hugging Face Spaces** | Service for developer-friendly deployments of data applications. |
|
54 |
+
|
55 |
+
The backend is built using Python, LangChain, Vectara, and Together AI's inference API with the `mistralai/Mixtral-8x7B-Instruct-v0.1` model for processing and understanding large amounts of data. Streamlit is used for the frontend, providing an intuitive interface for users. Hugging Face Spaces simplifies the deployment process, making the application easily accessible.
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
### Features
|
59 |
|
60 |
+
1. **Metadata Verification:**
|
61 |
+
- Cross-references document metadata against a comprehensive vector database of exemplary cases.
|
62 |
+
- Instantly identifies anomalies and discrepancies, ensuring metadata accuracy and compliance.
|
63 |
|
64 |
+
2. **Automated Metadata Correction:**
|
65 |
+
- Offers suggested metadata corrections based on processed PDF files, facilitating swift and accurate adjustments.
|
66 |
+
- Potential for automated inspection of numerous metadata rows for seamless large-scale data verification.
|
67 |
|
68 |
+
3. **Question Answering Retriever:**
|
69 |
+
- Utilizes Vectara vector store technology for efficient retrieval of relevant information.
|
70 |
+
- Employs Hugging Face embeddings E5 multilingual model for precise analysis of multilingual data.
|
71 |
+
- Identifies anomalies in names, descriptions, and disciplines, providing actionable insights for data accuracy.
|
72 |
|
73 |
+
4. **User-Friendly Interface:**
|
74 |
+
- Intuitive web interface for effortless document upload, metadata verification, and correction.
|
75 |
+
- Simplifies document management processes, reducing manual effort and enhancing operational efficiency.
|
76 |
|
77 |
+
### Install locally
|
78 |
|
79 |
1. Clone the repository:
|
80 |
```bash
|
81 |
+
$ git clone https://github.com/salgadev/DocVerifyRAG.git
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
```
|
83 |
|
84 |
2. Install dependencies:
|
|
|
86 |
$ pip install -r requirements.txt
|
87 |
```
|
88 |
|
89 |
+
3. Run using Streamlit:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
```bash
|
91 |
+
$ streamlit run app.py
|
92 |
```
|
93 |
|
94 |
### Usage
|
95 |
|
96 |
+
Access the web interface and follow the prompts to upload documents, classify them, and verify metadata. The AI-powered anomaly detection system will automatically flag any discrepancies or errors in the document metadata, providing accurate and reliable document management solutions.
|
97 |
## Authors
|
98 |
|
99 |
| Name | Link |
|
|
|
101 |
| Sandra Ashipala | [GitHub](https://github.com/sandramsc) |
|
102 |
| Elia Wäfler | [GitHub](https://github.com/eliawaefler) |
|
103 |
| Carlos Salgado | [GitHub](https://github.com/salgadev) |
|
|
|
104 |
|
105 |
|
106 |
## License
|
107 |
|
108 |
+
[![GitLicense](https://img.shields.io/badge/License-MIT-lime.svg)](https://github.com/eliawaefler/DocVerifyRAG/blob/main/LICENSE)
|