No punctuation
Yes, this is expected. This model was trained on a Russian dataset that I had access to that had been preprocessed with a particular focus in mind. Thus, if I recall correctly, all punctuation is removed and all words are lower-cased. I'm not sure about the artifacts in words however.
effort - π
result - π©
So original whisper is just better lol..
If you need case and punctuation, then yes you should use the original v2 model, or the new v3 model.
In un-cased and non-punctuation contexts, this model will likely have a lower WER than the original v2 model, particularly in noisy environments. I'm unsure about the v3 model, as I haven't tested it for Russian, but I assume v3 would be better as it improved substantially on non-English languages.
Can you finetune to russian version 3?
Unfortunately I cannot, I do not have access to the compute resource I used for this any more.