course-search-av / instructions.txt
Rohil Bansal
commit
2ed2129
This assignment involves building a **smart search tool** for Analytics Vidhya's free courses. Here's a breakdown of how you can approach it:
### Steps to Complete the Assignment
#### 1. **Data Collection**
- **Explore the Free Courses**: Visit the [Analytics Vidhya Free Courses](https://courses.analyticsvidhya.com/collections/courses) and collect course-related data such as:
- Course Titles
- Descriptions
- Curriculum
- **Tools for Data Collection**: Use Python libraries like `BeautifulSoup` (for web scraping) or APIs (if available) to extract the data.
#### 2. **Smart Search System**
- **Frameworks**: Use **LangChain (0.3.x)** or **LlamaIndex (0.12.x)** for implementing a Retrieval-Augmented Generation (RAG) system.
- **Steps**:
1. **Generate Embeddings**:
- Choose a pre-trained embedding model like OpenAI's `text-embedding-ada-002` or Sentence Transformers.
- Convert the collected course data into embeddings.
2. **Vector Database**:
- Use a vector database such as Pinecone, Weaviate, or FAISS to store and query embeddings.
3. **Search Implementation**:
- Implement a natural language search interface where user queries are matched against the stored embeddings.
- Return the most relevant courses.
#### 3. **Deployment on Huggingface Spaces**
- **Frameworks**: Use `Gradio` or `Streamlit` for creating an interactive UI.
- **Deployment Steps**:
1. Set up a Huggingface Spaces account.
2. Integrate your search tool into the interface.
3. Deploy the app and obtain a public URL.
#### 4. **Document Your Approach**
- Write a report explaining:
- Data collection process.
- Embedding model and vector database choices.
- System architecture.
- Challenges faced and how you addressed them.
---
### Required Skills and Tools
- **Web Scraping**: `requests`, `BeautifulSoup`
- **Embedding Models**: `sentence-transformers`, OpenAI API
- **Vector Database**: Pinecone, Weaviate, or FAISS
- **Deployment**: `Gradio`, `Streamlit`, Huggingface Spaces
- **Programming Language**: Python
---
If you need guidance on specific steps, feel free to ask!