metadata
license: cc-by-4.0
DeUnCaser
The output from Automated Speak Recognition software is usually uncased and without any punctation. This does not make a very readable text.
The DeUnCaser is a sequence-to-sequence byT5 model that is reversing this process. It adds punctation, and capitalises the correct words (in some languages the start of sentences and proper nouns, in other languages, like German, all nouns).
It is using a multi-lingual base, however the first test version is only trained on Norwegian. I will update it with support for other languages by demand.
Example input - output
this is a test of a program developed in norway it is actually strange that it is able to do stuff like this even on sequences that are tricky to read by humans