Rohil Bansal commited on
Commit
c02e015
·
1 Parent(s): 8e61fea
Files changed (1) hide show
  1. course_search.txt +10 -8
course_search.txt CHANGED
@@ -1,12 +1,15 @@
1
  Course Search System: My Implementation Journey
2
 
3
  Data Gathering
4
- I started by collecting course data from various sources like university websites and online platforms. I used web scraping techniques to extract information like course titles, descriptions, prerequisites, and more. I also integrated APIs from some institutions to directly access their course catalogs. Once I had the data, I cleaned and standardized it to ensure consistency.
5
 
6
  Choosing the Right Tools
7
- For processing the text data, I selected a powerful language model like BERT to understand the semantic meaning of course descriptions. This model converted the text into numerical representations (vectors) that computers could easily process.
 
8
 
9
  To efficiently store and search these vectors, I used a vector database called FAISS. It's designed for handling large datasets of vectors and quickly finding the most similar ones.
 
 
10
 
11
  Building the System
12
  I designed my system to be flexible and scalable. Here's a breakdown of its key components:
@@ -16,12 +19,11 @@ Embedding: The language model processes the course descriptions and creates vect
16
  Vector Database: This stores the vectors for efficient searching.
17
  Search API: This allows users to query the system and get relevant results.
18
  User Interface: This is the front-end where users can interact with the system.
19
- I deployed the system on a cloud platform to ensure it's reliable and can handle increasing user demand. Each component runs in its own container, making it easy to manage and update.
20
 
21
  Challenges and Solutions
22
 
23
- Data Quality: Ensuring data consistency and accuracy was a big challenge. I addressed this by carefully cleaning and standardizing the data.
24
- Model Performance: Choosing the right language model was crucial. I experimented with different models and fine-tuned them to get the best results.
25
- Scalability: Handling a large number of courses required a scalable vector database. FAISS was a great choice for this, and I configured it to handle the load.
26
- User Experience: I focused on making the system user-friendly. I conducted user tests and made improvements to the interface and search algorithm.
27
- Overall, this project was a great learning experience. I'm proud of what I've accomplished and excited to see how it can help students find the right courses.
 
1
  Course Search System: My Implementation Journey
2
 
3
  Data Gathering
4
+ I started by figuring how to scrape data from Analytics Vidhya website. I used bs4 for web scraping to extract information like course titles, descriptions, prerequisites, and more. Once I coded the fetching code, I made sure it is getting scraped properly and stored in a structured format to be used later on.
5
 
6
  Choosing the Right Tools
7
+ For processing the text data, I selected a powerful language model "all-MiniLM-L6-v2" to encode the relevant data in vector representation. This model converted the text into numerical representations that computers could easily process.
8
+ The all-MiniLM-L6-v2 model is an efficient choice for course search as it provides high-quality 384-dimensional embeddings while being lightweight (80MB), making it ideal for semantic similarity tasks with good performance-to-resource ratio.
9
 
10
  To efficiently store and search these vectors, I used a vector database called FAISS. It's designed for handling large datasets of vectors and quickly finding the most similar ones.
11
+ I wouldve preferred Pinecone as it is a cloud based database which is efficient as well as very clean and interpretable. But due to some new updates in pinecone servers it was having some issues, so keeping in mind the deadline I instead went for the 2nd best option that is FAISS.
12
+ FAISS is a preferred vector database for local storage as it is very fast and lightweight as well as fairly simple to work with when we are working with simple tasks.
13
 
14
  Building the System
15
  I designed my system to be flexible and scalable. Here's a breakdown of its key components:
 
19
  Vector Database: This stores the vectors for efficient searching.
20
  Search API: This allows users to query the system and get relevant results.
21
  User Interface: This is the front-end where users can interact with the system.
22
+ I deployed the system on huggingface spaces to ensure it's reliable and can handle increasing user demand. Each component runs in its own container, making it easy to manage and update.
23
 
24
  Challenges and Solutions
25
 
26
+ Data Quality: Ensuring data consistency and accuracy was a big challenge. I addressed this by carefully cleaning and standardizing the data while scraping. Since scraping text accurately is a bit tedious task, it took me some time to get it correct.
27
+ Model Performance: Choosing the right language model was crucial. I experimented with different models to get the best results. Ultimately keeping in mind the use case, i figured it would be best to keep it lightweight and quick for quick search results.
28
+ User Experience: I focused on making the system user-friendly. I conducted many tests and made improvements to the interface and search algorithm.
29
+ Overall, this project was a great learning experience. I'm proud of what I've accomplished in the deadline and believe I couldve improved it given some more time. I am looking forward to discuss this project in an interview with you.