Can not be load in whisper.cpp

by MonolithFoundation - opened Dec 13, 2024

Dec 13, 2024

whisper_model_load: unknown tensor 'model.encoder.conv1.weight' in model file
whisper_init_with_params_no_state: failed to load model

why??

shuaijiang

BELLE-2 Group // Be Everyone's Large Language model Engine org Dec 13, 2024

Please provide the specific runing scripts

MonolithFoundation

Dec 13, 2024

hi, please help to confirm, the scripts used here is simple:

def test_pywhisper():
    from pywhispercpp.model import Model

    # model = Model('base.en', n_threads=6)
    #model = Model("large-v3-turbo", n_threads=6)

    audio_f = "temp/yyxh3_1206/parts_asr/50.9_51.4.wav"
    print(audio_f)
    model = Model('checkpoints/belle_whisper_v3_turbo_ggml/ggml-model.bin', n_threads=6)
    # segments = model.transcribe('data/lei-jun-test.wav')
    # segments = model.transcribe('temp/yyxh3_1206/extracted_audio_clean.wav')
    # segments = model.transcribe('temp/yyxh3_1206/extracted_audio.wav')
    segments = model.transcribe(audio_f, language="zh")
    for segment in segments:
        print(segment.text)

All default models is OK, except belle model load faild.

Any help?

I using pywhispercpp because it's the only workable whipser.cpp binding, it should be exactly same as whisper.cpp, also other default models are inferencable the lib itself should be OK.

telehan

Dec 15, 2024

fail too

$ ./test_model.sh ~/Downloads/14_14_02.WAV
whisper_init_from_file_with_params_no_state: loading model from '~/Workspace/huggingface/Belle-whisper-large-v3-turbo-zh-ggml/ggml-model.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: backends   = 3
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3 (small)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:    Metal total size =   487.23 MB
whisper_model_load: unknown tensor 'model.encoder.conv1.weight' in model file
whisper_init_with_params_no_state: failed to load model
error: failed to initialize whisper context

$ shasum -a 256 ggml-model.bin
fa40644ba8947b91474c6d7c3d760d95693db745205c1feae890f72de5fa1eae  ggml-model.bin

$ cat test_model.sh
asr_engine=~/Workspace/github/whisper.cpp/main
asr_model=~/Workspace/huggingface/Belle-whisper-large-v3-turbo-zh-ggml/ggml-model.bin
init_prompt="转录中文和English内容，补充标点符号"
$asr_engine -m "$asr_model" -l zh -pc -nt -osrt --prompt ${init_prompt} -f "$1"

shuaijiang

BELLE-2 Group // Be Everyone's Large Language model Engine org Dec 16, 2024

I am checking the model

shuaijiang

BELLE-2 Group // Be Everyone's Large Language model Engine org Dec 16, 2024

•

edited Dec 16, 2024

The issue is resolved; download the model again

telehan

Dec 16, 2024

greate, it works now.

$ ./test_model.sh ~/Downloads/14_14_02.WAV
whisper_init_from_file_with_params_no_state: loading model from '~/Workspace/huggingface/Belle-whisper-large-v3-turbo-zh-ggml/ggml-model.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: backends   = 3
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:    Metal total size =  1623.92 MB
whisper_model_load: model size    = 1623.92 MB
whisper_backend_init_gpu: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
...

$ shasum -a 256 Belle-whisper-large-v3-turbo-zh-ggml/ggml-model.bin
2a3bba5bfdb4d4da3d9949a83b405711727ca1941d4d5810895e077eb3cb4d99  Belle-whisper-large-v3-turbo-zh-ggml/ggml-model.bin

MonolithFoundation

Dec 16, 2024

Hi, it is the ggml weights too old for newest whisper.cpp to load?

shuaijiang

BELLE-2 Group // Be Everyone's Large Language model Engine org Dec 16, 2024

The issue arose due to a mismatch in the conversion scripts. I have addressed this bug, and the corrected scripts are now available at this GitHub repository. You can use these updated scripts for converting models without encountering the previous problems.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment