Commit History

Corrected minor Dockerfile package version issue
593153e
Running

seanpedrickcase commited on

App now retains original index following cleaning to allow for referring back to original data
90553eb

seanpedrickcase commited on

Now installed dependencies into correct folder in Dockerfile
5888649

seanpedrickcase commited on

Finally managed to enforce cpu torch install in Dockerfile
97913c4

seanpedrickcase commited on

Further optimised Dockerfile and requirements (smaller torch installation now hopefully)
00db72b

seanpedrickcase commited on

Transferring across installed packages from build stage in Dockerfile
c9da99d

seanpedrickcase commited on

Changed Dockerfile to multi-stage build to further reduce size
0fd155c

seanpedrickcase commited on

Trying to make container image smaller through Dockerfile
7d5387e

seanpedrickcase commited on

Minor changes to reduce Dockerfile size
b767539

seanpedrickcase commited on

Updated download_model.py to download pytorch .bin file
1c0bfd4

seanpedrickcase commited on

Removed some requirements from Dockerfile for AWS deployment to reduce container size
51ba1cb

seanpedrickcase commited on

Added NUMBA_CACHE_DIR to Docker environmental variables
cd6a3e0

seanpedrickcase commited on

Allowed for app running on AWS to use smaller embedding model and not to load representation LLM (due to size restrictions).
22ca76e

seanpedrickcase commited on

Dockerfile now installs models directly into user folder instead of moving from base folder
3c1c3de

seanpedrickcase commited on

Updated Gradio version for spaces. Updated Dockerfile to enable Llama.cpp build with Cmake
d34af22

seanpedrickcase commited on

Only aggregate topics not 'other', allowed for minimum sentence length, default max_topics now will auto aggregate topics. Added Cognito Auth functionality (boto3 with AWS).
1e2bb3e

seanpedrickcase commited on

Can split passages into sentences. Improved embedding, LLM representation models, improved zero shot capabilities
55f0ce3

seanpedrickcase commited on

Updated packages. Improve hierarchy vis. Better models - mixedbread and phi3. Now option to split texts into sentences before modelling.
04a15c5

seanpedrickcase commited on

Minor cleaning, csv formatting changes
d80c8f5

Sean-Case commited on

Reduce outliers now more efficient and relabels with correct vectoriser. Default topic labels now tidier. Hiearchical topics outputs more useful for joining to df afterwards. Switched low resource reduction algorithm to UMAP as default is not good.
e1c1f68

Sonnyjim commited on

Should now parse custom regex correctly. Will now wipe previously created embeddings if 'low resource mode' option switched.
0a543a0

Sean-Case commited on

Allowed for uploading custom regex for cleaning. Fixed calculate all probabilities, reduce outliers. Added text tree for hierarchical modelling.
381f959

Sonnyjim commited on

Upgraded to Gradio 4.16.0. Guide for converting to exe added.
0a177ca

Sonnyjim commited on

Hopefully now LLM download from hub should work
cdcd7af

Sonnyjim commited on

Note about LLM not working now successfully added!
e2dfc1e

Sean-Case commited on

Added note to say that LLM representation is not currently working on the HF website
3b4333f

Sean-Case commited on

Trying to download LLM to local_dir instead of cache_dir
539aba9

Sean-Case commited on

LLM model save is failing in Huggingface - attempting instead to save to base folder
c2bf185

Sean-Case commited on

Some text changes. Fixed a couple of TF-IDF embeddings issues
87306c7

Sean-Case commited on

Switched embeddings to low resource TF-IDF by default. Some text changes.
a7fdf3b

Sean-Case commited on

Fixed file load with files including capital letters
9c6425d

Sonnyjim commited on

Added clean data options, improved re-representation options and visualisation. General format changes
4effac0

Sonnyjim commited on

Allowed for loading in external topic labels. A few visualisation modifications.
b27bab2

Sonnyjim commited on

Model save now checks and makes a folder before writing the model
356791c

Sonnyjim commited on

Lots of general fixes. New visualisations, fixed hierarchical vis for zero shot. Added calc all probabilities.
b4510a6

Sonnyjim commited on

Changed Phi model to smaller StableLM 2 1.6. Fixed a None type detection error.
1f1a1c7

Sonnyjim commited on

Disabled console logging as it was getting in the way of file load into the app
731ed23

Sonnyjim commited on

Switched embeddings model to BGE Small 1.5 as Jina seemed unable to do zero shot topic modelling properly
be094ee

Sonnyjim commited on

Added minimum similarity slider for zero shot topic modelling
0fe5421

Sean-Case commited on

model and hierarchy details should now save properly
6622531

Sonnyjim commited on

Split off LLM representation, visualisation, and reduce outliers from main function. Added hierarchical visualisation and logs
5d87c3c

Sonnyjim commited on

More efficient embeddings save and representations load/process. Custom visualisation hover option added, formatting improvements. Version 0.1?
ffe5eb2

Sonnyjim commited on

App should now check if embeddings are loaded before topic modelling. And will save only once.
9eeba1e

Sonnyjim commited on

Hopefully fixed install and load of LLM model on systems without a HF_HOME environmental variable
32cf9fb

Sean-Case commited on

Added option to reduce outliers based on closest topic
e09dd3b

Sonnyjim commited on

Returned TruncatedSVD components to 100 - higher values don't seem to help
43ac0d8

Sean-Case commited on

Greatly increased low resource process dimensions for higher quality. Visualisations disabled by default to increase speed.
fac3624

Sean-Case commited on

Greatly improved low resource mode speed (at cost of potential quality)
aa3df37

Sean-Case commited on

Changed zero shot min similarity to 0.5
0b7839c

Sonnyjim commited on

Added controls for saving topic models and visualisation. Removed custom UMAP layer
81f1b56

Sonnyjim commited on