Spaces:
Running
Running
Commit History
Importing space package near start of app now to avoid issue with cuda being initialised before
9e84863
Llama-cpp-python in GPU mode doesn't seem to work well with Bertopic on Huggingface, so downgrading that to CPU version
88d81fa
Rearranged functions for embeddings creation to be compatible with zero GPU space. Updated packages.
cc495e1
Added and replaced relevant files to download in download_model.py to allow for app use on AWS
49e0db8
Updated Dockerfile with latest packages
08eb30d
Added example of how to run function from command line. Updated packages. Embedding model default now smaller and at fp16.
34f1e83
Improved initial clean options. Now has option to return embeddings only.
89c4d20
Corrected minor Dockerfile package version issue
593153e
App now retains original index following cleaning to allow for referring back to original data
90553eb
Now installed dependencies into correct folder in Dockerfile
5888649
Finally managed to enforce cpu torch install in Dockerfile
97913c4
Further optimised Dockerfile and requirements (smaller torch installation now hopefully)
00db72b
Transferring across installed packages from build stage in Dockerfile
c9da99d
Changed Dockerfile to multi-stage build to further reduce size
0fd155c
Trying to make container image smaller through Dockerfile
7d5387e
Minor changes to reduce Dockerfile size
b767539
Updated download_model.py to download pytorch .bin file
1c0bfd4
Removed some requirements from Dockerfile for AWS deployment to reduce container size
51ba1cb
Added NUMBA_CACHE_DIR to Docker environmental variables
cd6a3e0
Allowed for app running on AWS to use smaller embedding model and not to load representation LLM (due to size restrictions).
22ca76e
Dockerfile now installs models directly into user folder instead of moving from base folder
3c1c3de
Updated Gradio version for spaces. Updated Dockerfile to enable Llama.cpp build with Cmake
d34af22
Only aggregate topics not 'other', allowed for minimum sentence length, default max_topics now will auto aggregate topics. Added Cognito Auth functionality (boto3 with AWS).
1e2bb3e
Can split passages into sentences. Improved embedding, LLM representation models, improved zero shot capabilities
55f0ce3
Updated packages. Improve hierarchy vis. Better models - mixedbread and phi3. Now option to split texts into sentences before modelling.
04a15c5
Minor cleaning, csv formatting changes
d80c8f5
Sean-Case
commited on
Reduce outliers now more efficient and relabels with correct vectoriser. Default topic labels now tidier. Hiearchical topics outputs more useful for joining to df afterwards. Switched low resource reduction algorithm to UMAP as default is not good.
e1c1f68
Should now parse custom regex correctly. Will now wipe previously created embeddings if 'low resource mode' option switched.
0a543a0
Sean-Case
commited on
Allowed for uploading custom regex for cleaning. Fixed calculate all probabilities, reduce outliers. Added text tree for hierarchical modelling.
381f959
Upgraded to Gradio 4.16.0. Guide for converting to exe added.
0a177ca
Hopefully now LLM download from hub should work
cdcd7af
Note about LLM not working now successfully added!
e2dfc1e
Sean-Case
commited on
Added note to say that LLM representation is not currently working on the HF website
3b4333f
Sean-Case
commited on
Trying to download LLM to local_dir instead of cache_dir
539aba9
Sean-Case
commited on
LLM model save is failing in Huggingface - attempting instead to save to base folder
c2bf185
Sean-Case
commited on
Some text changes. Fixed a couple of TF-IDF embeddings issues
87306c7
Sean-Case
commited on
Switched embeddings to low resource TF-IDF by default. Some text changes.
a7fdf3b
Sean-Case
commited on
Fixed file load with files including capital letters
9c6425d
Added clean data options, improved re-representation options and visualisation. General format changes
4effac0
Allowed for loading in external topic labels. A few visualisation modifications.
b27bab2
Model save now checks and makes a folder before writing the model
356791c
Lots of general fixes. New visualisations, fixed hierarchical vis for zero shot. Added calc all probabilities.
b4510a6
Changed Phi model to smaller StableLM 2 1.6. Fixed a None type detection error.
1f1a1c7
Disabled console logging as it was getting in the way of file load into the app
731ed23
Switched embeddings model to BGE Small 1.5 as Jina seemed unable to do zero shot topic modelling properly
be094ee
Added minimum similarity slider for zero shot topic modelling
0fe5421
Sean-Case
commited on