Spaces:
Sleeping
Sleeping
This assignment involves building a **smart search tool** for Analytics Vidhya's free courses. Here's a breakdown of how you can approach it: | |
### Steps to Complete the Assignment | |
#### 1. **Data Collection** | |
- **Explore the Free Courses**: Visit the [Analytics Vidhya Free Courses](https://courses.analyticsvidhya.com/collections/courses) and collect course-related data such as: | |
- Course Titles | |
- Descriptions | |
- Curriculum | |
- **Tools for Data Collection**: Use Python libraries like `BeautifulSoup` (for web scraping) or APIs (if available) to extract the data. | |
#### 2. **Smart Search System** | |
- **Frameworks**: Use **LangChain (0.3.x)** or **LlamaIndex (0.12.x)** for implementing a Retrieval-Augmented Generation (RAG) system. | |
- **Steps**: | |
1. **Generate Embeddings**: | |
- Choose a pre-trained embedding model like OpenAI's `text-embedding-ada-002` or Sentence Transformers. | |
- Convert the collected course data into embeddings. | |
2. **Vector Database**: | |
- Use a vector database such as Pinecone, Weaviate, or FAISS to store and query embeddings. | |
3. **Search Implementation**: | |
- Implement a natural language search interface where user queries are matched against the stored embeddings. | |
- Return the most relevant courses. | |
#### 3. **Deployment on Huggingface Spaces** | |
- **Frameworks**: Use `Gradio` or `Streamlit` for creating an interactive UI. | |
- **Deployment Steps**: | |
1. Set up a Huggingface Spaces account. | |
2. Integrate your search tool into the interface. | |
3. Deploy the app and obtain a public URL. | |
#### 4. **Document Your Approach** | |
- Write a report explaining: | |
- Data collection process. | |
- Embedding model and vector database choices. | |
- System architecture. | |
- Challenges faced and how you addressed them. | |
--- | |
### Required Skills and Tools | |
- **Web Scraping**: `requests`, `BeautifulSoup` | |
- **Embedding Models**: `sentence-transformers`, OpenAI API | |
- **Vector Database**: Pinecone, Weaviate, or FAISS | |
- **Deployment**: `Gradio`, `Streamlit`, Huggingface Spaces | |
- **Programming Language**: Python | |
--- | |
If you need guidance on specific steps, feel free to ask! |