clu-ling/whisper-large-v2-japanese-5k-steps · All numbers are converted to Chinese numerals.

Jun 28, 2024

Since all numbers are converted to Chinese numerals, they are difficult to read and impractical.

clu-ling org Jun 28, 2024

@cookiexND , unfortunately our lab has no plans to publish an updated version of this model, since we're no longer using it in any applications.

If you're interested in fine-tuning a model that represents numerals as you desire, I would suggest modifying the transcripts for Common Voice v18 (https://commonvoice.mozilla.org/ja/datasets) to account for your desired output (see https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/raw/main/transcript/ja/train.tsv for a sample). You'll probably want to take care that you have sufficient examples where digits should not be converted to Arabic numerals (ex. 伊丹十三 → 伊丹十三 vs. 三種類 → 3種類 ). To ensure that sort of coverage, you may need to record additional data to supplement Common Voice.

An alternative would be to post-process your output using something like https://github.com/nagataaaas/Kanjize (you could also fine-tune a seq2seq transformer or a generative LLM to accomplish this).

cookiexND

Jun 28, 2024

Thank you for your kind reply. Understood.

cookiexND changed discussion status to closed Jun 28, 2024