Spaces:
Sleeping
Sleeping
title: LoomRAG | |
emoji: π | |
colorFrom: indigo | |
colorTo: pink | |
sdk: streamlit | |
sdk_version: 1.41.1 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
short_description: π§ Multimodal RAG that "weaves" together text and images πͺ‘ | |
# π LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search | |
 | |
 | |
 | |
 | |
 | |
 | |
 | |
 | |
<a href="https://huggingface.co/spaces/NotShrirang/LoomRAG"><img src="https://img.shields.io/badge/Streamlit%20App-red?style=flat-rounded-square&logo=streamlit&labelColor=white"/></a> | |
This project implements a Multimodal Retrieval-Augmented Generation (RAG) system, named **LoomRAG**, that leverages OpenAI's CLIP model for neural cross-modal retrieval and semantic search. The system allows users to input text queries and retrieve both text and image responses seamlessly through vector embeddings. It features a comprehensive annotation interface for creating custom datasets and supports CLIP model fine-tuning with configurable parameters for domain-specific applications. The system also supports uploading images and PDFs for enhanced interaction and intelligent retrieval capabilities through a Streamlit-based interface. | |
Experience the project in action: | |
[](https://huggingface.co/spaces/NotShrirang/LoomRAG) | |
--- | |
## πΈ Implementation Screenshots | |
|  |  | | |
| ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | | |
| Data Upload Page | Data Search / Retrieval | | |
| | | | |
|  |  | | |
| Data Annotation Page | CLIP Fine-Tuning | | |
--- | |
## β¨ Features | |
- π **Cross-Modal Retrieval**: Search text to retrieve both text and image results using deep learning | |
- π **Streamlit Interface**: Provides a user-friendly web interface for interacting with the system | |
- π€ **Upload Options**: Allows users to upload images and PDFs for AI-powered processing and retrieval | |
- π§ **Embedding-Based Search**: Uses OpenAI's CLIP model to align text and image embeddings in a shared latent space | |
- π **Augmented Text Generation**: Enhances text results using LLMs for contextually rich outputs | |
- π·οΈ **Image Annotation**: Enables users to annotate uploaded images through an intuitive interface | |
- π― **CLIP Fine-Tuning**: Supports custom model training with configurable parameters including test dataset split size, learning rate, optimizer, and weight decay | |
- π¨ **Fine-Tuned Model Integration**: Seamlessly load and utilize fine-tuned CLIP models for enhanced search and retrieval | |
--- | |
## ποΈ Architecture Overview | |
1. **Data Indexing**: | |
- Text, images, and PDFs are preprocessed and embedded using the CLIP model | |
- Embeddings are stored in a vector database for fast and efficient retrieval | |
2. **Query Processing**: | |
- Text queries are converted into embeddings for semantic search | |
- Uploaded images and PDFs are processed and embedded for comparison | |
- The system performs a nearest neighbor search in the vector database to retrieve relevant text and images | |
3. **Response Generation**: | |
- For text results: Optionally refined or augmented using a language model | |
- For image results: Directly returned or enhanced with image captions | |
- For PDFs: Extracts text content and provides relevant sections | |
4. **Image Annotation**: | |
- Dedicated annotation page for managing uploaded images | |
- Support for creating and managing multiple datasets simultaneously | |
- Flexible annotation workflow for efficient data labeling | |
- Dataset organization and management capabilities | |
5. **Model Fine-Tuning**: | |
- Custom CLIP model training on annotated images | |
- Configurable training parameters for optimization | |
- Integration of fine-tuned models into the search pipeline | |
--- | |
## π Installation | |
1. Clone the repository: | |
```bash | |
git clone https://github.com/NotShrirang/LoomRAG.git | |
cd LoomRAG | |
``` | |
2. Create a virtual environment and install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
--- | |
## π Usage | |
1. **Running the Streamlit Interface**: | |
- Start the Streamlit app: | |
```bash | |
streamlit run app.py | |
``` | |
- Access the interface in your browser to: | |
- Submit natural language queries | |
- Upload images or PDFs to retrieve contextually relevant results | |
- Annotate uploaded images | |
- Fine-tune CLIP models with custom parameters | |
- Use fine-tuned models for improved search results | |
2. **Example Queries**: | |
- **Text Query**: "sunset over mountains" | |
Output: An image of a sunset over mountains along with descriptive text | |
- **PDF Upload**: Upload a PDF of a scientific paper | |
Output: Extracted key sections or contextually relevant images | |
--- | |
## βοΈ Configuration | |
- π **Vector Database**: It uses FAISS for efficient similarity search | |
- π€ **Model**: Uses OpenAI CLIP for neural embedding generation | |
- βοΈ **Augmentation**: Optional LLM-based augmentation for text responses | |
- ποΈ Fine-Tuning: Configurable parameters for model training and optimization | |
--- | |
## πΊοΈ Roadmap | |
- [x] Fine-tuning CLIP for domain-specific datasets | |
- [ ] Adding support for audio and video modalities | |
- [ ] Improving the re-ranking system for better contextual relevance | |
- [ ] Enhanced PDF parsing with semantic section segmentation | |
--- | |
## π€ Contributing | |
Contributions are welcome! Please open an issue or submit a pull request for any feature requests or bug fixes. | |
--- | |
## π License | |
This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details. | |
--- | |
## π Acknowledgments | |
- [OpenAI CLIP](https://openai.com/research/clip) | |
- [FAISS](https://github.com/facebookresearch/faiss) | |
- [Hugging Face](https://huggingface.co/) | |