Spaces:

CabraVC
/

holiday_testing

Paused

App Files Files Community

holiday_testing / utils /useful_terminal_commands.txt

svystun-taras's picture

created the updated web ui

0fdb130 7 months ago

raw history blame

No virus

899 Bytes

	find ./csvs -type f -name "*.csv" -exec tail -n +2 {} \; \| tr -cd '§' \| wc -c \| awk '{ print int($1 / 2) }'
	To count the number of rows in csvs. It equals number of labels we have.


	find ./txts -type f -name "*.txt" -exec cat {} \; \| wc -w
	To count the total number of words in the txts.


	The first version of the dataset contains 1,842,816 words and 264 labels. It means that each text is approximately 7,000 words long.


	find ./txts -type f -name "*.txt" -exec awk 'length($0) > 20 {gsub(/[^[:alnum:]]/, " "); for (i=1; i<=NF; i++) if (length($i) > 20) print FILENAME ":", $i}' {} \;
	To print files and words, which are longer than 20 letters

	find ./txts -type f -name "*.txt" -exec awk 'length($0) > 20 {gsub(/[^[:alnum:]]/, " "); for (i=1; i<=NF; i++) if (length($i) > 20) count++} END {if (count > 0) print FILENAME ":", count}' {} \;
	To print how many broken words there are in each file.