Respair
/

Hibiki_ASR_Phonemizer_v0.2

Safetensors

Japanese

whisper

Generated from Trainer

Model card Files Files and versions Community

Respair commited on Sep 25

Commit

dca5d56

•

1 Parent(s): 17570d9

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -105

README.md CHANGED Viewed

@@ -27,112 +27,9 @@ more accurate representation for Japanese.
 Don't use this model without the post processing functions I wrote below, or you'll get less than ideal performance. check the notebook.
-## Inference and Post-proc (Highly recommended to check the notebook below!)
-```python
-# this function was borrowed and modified from Aaron Yinghao Li, the Author of StyleTTS paper.
-from datasets import Dataset, Audio
-from transformers import WhisperProcessor, WhisperForConditionalGeneration
-import re
-import pykakasi
-kana_mapper = dict([
-    ("ゔぁ","ba"),
-          .
-          .
-          .
-          etc. # Take a look at the Notebook for the whole code
-    ("ぉ"," o"),
-    ("ゎ"," ɯa"),
-    ("ぉ"," o"),
-    ("を","o")
-])
-def post_fix(text):
-    orig = text
-    for k, v in kana_mapper.items():
-        text = text.replace(k, v)
-    return text
-processor = WhisperProcessor.from_pretrained("openai/whisper-large-v3")
-model = WhisperForConditionalGeneration.from_pretrained("Respair/Hibiki_ASR_Phonemizer").to("cuda:0")
-forced_decoder_ids = processor.get_decoder_prompt_ids(task="transcribe", language='japanese')
-def convert_to_kana(text):
-    kks = pykakasi.kakasi()
-    def convert_word(word):
-        result = kks.convert(word)
-        return ''.join(item['hira'] for item in result)
-    parts = re.split(r'([^\u3000-\u30ff\u3400-\u4dbf\u4e00-\u9fff]+)', text)
-    converted_parts = [convert_word(part) if re.match(r'[\u3000-\u30ff\u3400-\u4dbf\u4e00-\u9fff]', part) else part for part in parts]
-    return ''.join(converted_parts)
-sample = Dataset.from_dict({"audio": ["/content/kl_chunk1987.wav"]}).cast_column("audio", Audio(16000))
-sample = sample[0]['audio']
-# Ensure the input features are on the same device as the model
-input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features.to("cuda:0")
-# generate token ids
-predicted_ids = model.generate(input_features,forced_decoder_ids=forced_decoder_ids, repetition_penalty=1.2)
-# decode token ids to text
-transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
-# You can add your final adjustments here, it's better to write a dict though, but I'm just giving you a quick demonstration here.
-if ' neɽitai ' in transcription[0]:
-    transcription[0] = transcription[0].replace(' neɽitai ', "naɽitai")
-if 'harɯdʑisama' in transcription[0]:
-    transcription[0] = transcription[0].replace('harɯdʑisama', "arɯdʑisama")
-if 'tɕabiʔto' in transcription[0]:
-    transcription[0] = transcription[0].replace('tɕabiʔto', "tɕabiʔto")
-if "ki ni ɕinai" in transcription[0]:
-    transcription[0] = re.sub(r'(?<!\s)ki ni ɕinai', r' ki ni ɕinai', transcription[0])
-if 'ʔt' in transcription[0]:
-    transcription[0] = re.sub(r'(?<!\s)ʔt', r'ʔt', transcription[0])
-if 'de aɽoɯ' in transcription[0]:
-    transcription[0] = re.sub(r'(?<!\s)de aɽoɯ', r' de aɽoɯ', transcription[0])
-if ".ʔ" in transcription[0]:
-    transcription[0] = transcription[0].replace(".ʔ","..")
-if "ʔ." in transcription[0]:
-    transcription[0] = transcription[0].replace("ʔ.",".")
-transcription[0] = convert_to_kana(transcription[0]) # Ensuring the model won't hallucinate and accidentally return kana / kanji.
-post_fix(transcription[0].lstrip())
-```
-the Full code -> [Notebook](https://colab.research.google.com/drive/13tx8WKzkvePFdtKU4WUE_iYyYCqTY8dZ#scrollTo=5XqUs-sPdT79)
 ## Intended uses & limitations

 Don't use this model without the post processing functions I wrote below, or you'll get less than ideal performance. check the notebook.
+## Inference and Post-proc
+Check here -> [Notebook](https://colab.research.google.com/drive/13tx8WKzkvePFdtKU4WUE_iYyYCqTY8dZ#scrollTo=5XqUs-sPdT79)
 ## Intended uses & limitations