config.json, tokenizer.json etc. files for onnx?
Hi Guys, - where are you hiding config.json, tokenizer.json etc. files for onnx?
Sorry, I cannot understand you.
What are those files?
We don't have such files in Next-gen Kaldi.
Everything is open-sourced in Next-gen Kaldi. We would like to share everything we can share.
I mean - when I try to load vosk through onnx and transformers like that:
import { pipeline, env } from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0";
const model = await pipeline('automatic-speech-recognition', 'alphacep/vosk-model-small-ru');
it throws in browser console an error:
https://huggingface.co/alphacep/vosk-model-small-ru/resolve/main/preprocessor_config.json 404 (Not Found)
https://huggingface.co/alphacep/vosk-model-small-ru/resolve/main/config.json 404 (Not Found)
https://huggingface.co/alphacep/vosk-model-small-ru/resolve/main/tokenizer_config.json 404 (Not Found)
https://huggingface.co/alphacep/vosk-model-small-ru/resolve/main/tokenizer.json 404 (Not Found)
For example, Whisper ('Xenova/whisper-small') works with the same code fine. And if you check their HF-repo "Files and versions" - they have over there all these files - config,tokenizer etc.
It seems that in order to work with the onnx/transormers.js pipeline you have to have this files in repo.
Maybe I miss her something? I'm not very familiar with that pipeline, but as analogue - Whisper works fine having all these files...
I mean - when I try to load vosk through onnx and transformers like that:
The model is not designed to be usable in transformers.
Please use
http://github.com/k2-fsa/sherpa-onnx
The model can be only used in sherpa-onnx, not elsewhere.
You can find its usage
https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition/blob/main/model.py#L434
Could you tell us where you find instructions saying that you can use the model with transformers?
Ah so, got it. Thanx guys a lot for an explanation and links!
Would be greate anyway if one day Vosk could be reached via transformers.js too!:)
I am afraid there is no such a plan to support it.
Our target is for C++ deployment.