Commit History

Update requirements.txt
1fe38fd

daedalus66 commited on

Update requirements.txt
1fcbb8d

HugoLaurencon commited on

Delete explanation_filtering_pipeline.pdf
3ae24a3

HugoLaurencon commited on

Upload explanation_filtering_pipeline.pdf
2be3583

HugoLaurencon commited on

Delete explanation_filtering_pipeline.pdf
3327a22

HugoLaurencon commited on

remove arabic and viet models
836c1b3

HugoLaurencon commited on

Merge branch 'main' of https://huggingface.co/spaces/huggingface/text-data-filtering
f6058aa

HugoLaurencon commited on

back to before portuguese
091dbe4

HugoLaurencon commited on

update visu for Portuguese
2b811ac

HugoLaurencon commited on

add register information
061d2e4

HugoLaurencon commited on

new filter on word repetition ratio
4809033

HugoLaurencon commited on

visualization: small step for the slider on flagged words ratio
fa81556

HugoLaurencon commited on

visualization: choose between several languages
0610f9d

HugoLaurencon commited on

distributions for the filters on words and discarded words by filter
da13b29

HugoLaurencon commited on

visualization: upload our own stop words and flagged words list
5d56c36

HugoLaurencon commited on

everything in expanders
2c2527f

HugoLaurencon commited on

display distributions in sidebar and filtering parameters in expanders
5d485e5

HugoLaurencon commited on

rename badwords to flagged words + new flagged words list of 68 words
f217a73

HugoLaurencon commited on

button to download parameters
bfbcd60

HugoLaurencon commited on

fix division by 0 in compute_special_characters_ratio
b607b76

HugoLaurencon commited on

new tool to analyse our own doc
6f25c5c

HugoLaurencon commited on

filter on repetition removal
693f997

HugoLaurencon commited on

Delete en_examples_with_stats_no_small_docs.json
58d483d

HugoLaurencon commited on

Delete en_examples_with_stats_ldnoob.json
b190ef8

HugoLaurencon commited on

Delete en_examples_with_stats.json
0376199

HugoLaurencon commited on

remove zipf's law and update of the doc
3fd19c1

HugoLaurencon commited on

visu with discarded documents by filter
14574d7

HugoLaurencon commited on

faster visu (less documents)
07c617e

HugoLaurencon commited on