Rohil Bansal commited on
Commit
dd0bddd
·
1 Parent(s): c02e015

Scraper improved.

Browse files
course_search/scraper/__pycache__/__init__.cpython-311.pyc CHANGED
Binary files a/course_search/scraper/__pycache__/__init__.cpython-311.pyc and b/course_search/scraper/__pycache__/__init__.cpython-311.pyc differ
 
course_search/scraper/__pycache__/__init__.cpython-312.pyc DELETED
Binary file (161 Bytes)
 
course_search/scraper/__pycache__/course_scraper.cpython-311.pyc CHANGED
Binary files a/course_search/scraper/__pycache__/course_scraper.cpython-311.pyc and b/course_search/scraper/__pycache__/course_scraper.cpython-311.pyc differ
 
course_search/scraper/__pycache__/course_scraper.cpython-312.pyc DELETED
Binary file (5.03 kB)
 
course_search/scraper/course_scraper.py CHANGED
@@ -62,7 +62,7 @@ class CourseScraper:
62
  course_info['title'] = title_elem.text.strip()
63
 
64
  # Extract description
65
- desc_elem = soup.find('div', class_='rich-text__container')
66
  if desc_elem:
67
  course_info['description'] = desc_elem.text.strip()
68
 
 
62
  course_info['title'] = title_elem.text.strip()
63
 
64
  # Extract description
65
+ desc_elem = soup.find('div', class_='rich-text section-height__medium')
66
  if desc_elem:
67
  course_info['description'] = desc_elem.text.strip()
68
 
course_search/search_system/__pycache__/__init__.cpython-311.pyc CHANGED
Binary files a/course_search/search_system/__pycache__/__init__.cpython-311.pyc and b/course_search/search_system/__pycache__/__init__.cpython-311.pyc differ
 
course_search/search_system/__pycache__/__init__.cpython-312.pyc DELETED
Binary file (167 Bytes)
 
course_search/search_system/__pycache__/data_pipeline.cpython-311.pyc CHANGED
Binary files a/course_search/search_system/__pycache__/data_pipeline.cpython-311.pyc and b/course_search/search_system/__pycache__/data_pipeline.cpython-311.pyc differ
 
course_search/search_system/__pycache__/data_pipeline.cpython-312.pyc DELETED
Binary file (2.6 kB)
 
course_search/search_system/__pycache__/embeddings.cpython-311.pyc CHANGED
Binary files a/course_search/search_system/__pycache__/embeddings.cpython-311.pyc and b/course_search/search_system/__pycache__/embeddings.cpython-311.pyc differ
 
course_search/search_system/__pycache__/embeddings.cpython-312.pyc DELETED
Binary file (2.65 kB)
 
course_search/search_system/__pycache__/rag_system.cpython-311.pyc CHANGED
Binary files a/course_search/search_system/__pycache__/rag_system.cpython-311.pyc and b/course_search/search_system/__pycache__/rag_system.cpython-311.pyc differ
 
course_search/search_system/__pycache__/vector_store.cpython-311.pyc CHANGED
Binary files a/course_search/search_system/__pycache__/vector_store.cpython-311.pyc and b/course_search/search_system/__pycache__/vector_store.cpython-311.pyc differ
 
course_search/search_system/__pycache__/vector_store.cpython-312.pyc DELETED
Binary file (4.05 kB)
 
data/cache/course_embeddings.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f4d91de39bffed6c0291356791966feaaf2314c2d3391975038eec443e570f8
3
+ size 40064
data/cache/faiss_index.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:406a21e75f0af486e6bef5f8270daf482f2b7a7c1cf45d1091aabc25455a4ba7
3
+ size 39981
data/courses.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2945792f4e7f95855e4f48bcf3977aa52ec3156242e70efcaaf07f44864267a
3
+ size 77762
data/courses_with_embeddings.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c55c9ffb5774c19e7443e3f1c5b8a8c0add246c1516f05f80d29aa0433972f61
3
- size 139388
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee32928e02155e79620c72b032cd166fefc8de20dd648b65cea7aa3a6b2641bf
3
+ size 118516
data/embedding_cache/embeddings_cache_all-MiniLM-L6-v2.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5444508e05b63d662919fc358bfed9dea64094858d01a6df9b291a56525ea13d
3
- size 123513
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c05df73cb5ed569e2c252f4b6fed91aec245f8634b7023cfd134dfd58bb3024a
3
+ size 166317