daekeun-ml
/

Phi-4-multimodal-finetune-ko-speech

phi-4-multimodal

Model card Files Files and versions Community

daekeun-ml commited on 4 days ago

Commit

4c63337

·

verified ·

1 Parent(s): 39172c2

Create README.md

Files changed (1) hide show

README.md +40 -0

README.md ADDED Viewed

	@@ -0,0 +1,40 @@

+---
+datasets:
+- kresnik/zeroth_korean
+- mozilla-foundation/common_voice_17_0
+- PolyAI/minds14
+metrics:
+- bleu
+- cer
+base_model:
+- microsoft/Phi-4-multimodal-instruct
+language:
+- ko
+license: mit
+tags:
+- korean
+- stt
+- custom_code
+- phi
+- phi-4-multimodal
+---
+# Phi-4-multimodal-finetune-ko-speech
+This is a fine-tuned model for Korean speech-to-text translation, from [microsoft/Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) on the following datasets:
+- kresnik/zeroth_korean
+- mozilla-foundation/common_voice_17_0
+- PolyAI/minds14
+- Custom dataset on my own (Recorded Korean speech sentences and transcribed using Azure Speech-to-text API). The speech was a mix of fast and slow speech, with some modulation using [audiomentations](https://github.com/iver56/audiomentations).
+Total 35K samples. Each sample is a pair of Korean speech and its transcription. Dataset was sampled 16kHz.
+The model was trained on a single A100 80GB GPU for 1 epoch with a batch size of 16 using the `sample_finetune_speech.py` script from [microsoft/Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct)
+Note that this model is just a PoC/experimental purpose, and not intended to be used in production.
+Phi-4-multimodal model is strong in multimodal tasks, especially in speech-to-text and high potential in Korean language tasks. Thus if you are interested in Korean speech-to-text task, this model can be a good starting point.
+## Evaluation
+ASR (Automatic Speech Recognition) on zeroth-test set and Speech translation on fleurs ko <-> en speech translation result. Script is retrieved from [here](https://gist.github.com/seastar105/d1d8983b27611370528e3b194dcc5577#file-evaluate-py).