|
--- |
|
license: apache-2.0 |
|
language: |
|
- 'no' |
|
- nb |
|
- nn |
|
- en |
|
datasets: |
|
- NbAiLab/ncc_speech |
|
- NbAiLab/NST |
|
- NbAiLab/NPSC |
|
base_model: NbAiLab/nb-whisper-large-distil-turbo-beta |
|
tags: |
|
- audio |
|
- asr |
|
- automatic-speech-recognition |
|
metrics: |
|
- wer |
|
- cer |
|
library_name: transformers |
|
pipeline_tag: automatic-speech-recognition |
|
widget: |
|
- src: https://datasets-server.huggingface.co/assets/google/fleurs/--/nb_no/train/1/audio/audio.mp3 |
|
example_title: FLEURS sample 1 |
|
- src: https://datasets-server.huggingface.co/assets/google/fleurs/--/nb_no/train/4/audio/audio.mp3 |
|
example_title: FLEURS sample 2 |
|
--- |
|
|
|
|
|
|
|
# NB-Whisper Large Distilled |
|
|
|
Introducing **_NB Whisper Large Distil Turbo_**, a lighter, faster version of the Norwegian ASR model developed by the National Library of Norway. This distilled model maintains strong transcription quality while being optimized for resource-constrained environments. |
|
It is derived from the original NB-Whisper Large model through a distillation process, reducing the number of parameters while preserving performance for Automatic Speech Recognition (ASR) tasks. |
|
|
|
--- |
|
|
|
## Model Summary |
|
|
|
- **Model Size:** Reduced from 1550M parameters (Large) to 756M parameters(distilled). |
|
- **Languages Supported:** Norwegian (Bokmål). |
|
- **Base Model:** Derived from [NbAiLab/nb-whisper-large](https://huggingface.co/NbAiLab/nb-whisper-large). |
|
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
|
### Key Features: |
|
- **Speed:** Faster inference with reduced computational requirements, suitable for edge devices. |
|
- **Lightweight:** Ideal for applications requiring lower memory usage. |
|
- **Accuracy Retention:** Maintains competitive performance in word error rate (WER) and character error rate (CER) benchmarks. |
|
|
|
--- |
|
|
|
## Training and Distillation Details |
|
|
|
- **Distillation Process:** The model was distilled from the NB-Whisper Large model using a teacher-student framework to reduce model size while minimizing loss of accuracy. |
|
- **Datasets Used:** The same high-quality datasets as the original model, including: |
|
- NbAiLab/ncc_speech |
|
- NbAiLab/NST |
|
- NbAiLab/NPSC |
|
- **Training Steps:** Distillation involved several iterations of fine-tuning to achieve an optimal balance of size and performance. |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
|
|
### Local Setup |
|
To run locally, install the necessary libraries and use the Transformers pipeline: |
|
|
|
\`\`\`bash |
|
pip install transformers>=4.35.2 |
|
\`\`\` |
|
|
|
\`\`\`python |
|
from transformers import pipeline |
|
|
|
# Load the distilled model |
|
asr = pipeline("automatic-speech-recognition", "NbAiLab/NB-Whisper-Large-destil-Turbo-beta") |
|
|
|
# Transcribe audio |
|
result = asr("example_audio.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'}) |
|
print(result["text"]) |
|
\`\`\` |
|
|
|
--- |
|
|
|
## Performance |
|
|
|
The distilled model achieves similar results to the full NB-Whisper Large model in many scenarios but is optimized for speed and resource efficiency. It is ideal for real-time applications such as live transcription or mobile usage. |
|
|
|
### Example Use Cases: |
|
- Real-time transcription on low-resource devices. |
|
- Speech analysis in applications requiring low-latency responses. |
|
- Edge deployment in mobile or embedded systems. |
|
|
|
--- |
|
|
|
### API |
|
Instructions for accessing the models via a simple API are included in the demos under Spaces. Note that these demos are temporary and will only be available for a few weeks. |
|
|
|
## Training Data |
|
The training data originates from Språkbanken and the National Library of Norway's digital collection, including: |
|
|
|
- NST Norwegian ASR Database (16 kHz) and its corresponding dataset |
|
- Transcribed speeches from the Norwegian Parliament by Språkbanken |
|
- TV broadcast (NRK) subtitles (NLN digital collection) |
|
- Audiobooks (NLN digital collection) |
|
|
|
## Downstream Use |
|
|
|
The models, especially the smaller ones, may exhibit occasional hallucinations and may drop parts of the transcript. They are designed to convert spoken language into grammatically correct written sentences, which might not always be word-for-word translations. We have made two extra model variant for users that want a different transcription style. We encourage users to try the models themselves to get a better understanding. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Using these models without adequate risk assessment and mitigation could be considered irresponsible. They may contain biases or other undesirable distortions. Users who deploy these models or integrate them into systems or services are responsible for mitigating risks and complying with applicable AI regulations. The National Library of Norway, as the model owner, disclaims liability for any outcomes resulting from third-party use of these models. |
|
|
|
### Software |
|
The model was trained using Jax/Flax and converted to PyTorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under `Files and versions`. We welcome requests for conversion to other formats. All training code and scripts are released under the Apache License 2.0 in the GitHub repository [nb-whisper](https://github.com/NbAiLab/nb-whisper/). |
|
|
|
## Citation & Contributors |
|
The NB-Whisper Large model is a product of the NoSTram project led by Per Egil Kummervold ([@pere](https://huggingface.co/pere)) at the National Library of Norway. Key contributors include Javier de la Rosa ([@versae](https://huggingface.co/versae)), Freddy Wetjen ([@freddyw](https://huggingface.co/freddyw)), and Rolv-Arild Braaten ([@Rolv-Arild](https://huggingface.co/Rolv-Arild)). NB AI-Lab, under the direction of Svein Arne Brygfjeld ([@Brygfjeld](https://huggingface.co/Brygfjeld)), supported the project's successful completion. A detailed paper on our process and findings is forthcoming. |
|
|
|
## Disclaimer |
|
|
|
The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions. When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of artificial intelligence. In no event shall the owner of the models (The National Library of Norway) be liable for any results arising from the use made by third parties of these models. |
|
|
|
|
|
## Attribution |
|
|
|
This model is released under the Apache-2.0 license. Note that for downloads made in Norway, the requirements for attribution specified in the Norwegian copyright act still apply where relevant, even if not explicitly mentioned in the Apache License. Although attribution might not be required if the model is downloaded and used in other countries, we strongly encourage following the practive of marking subtitles with “Undertekster generert av NB-Whisper Medium v1.0” or “Subtitles generated by NB-Whisper Medium v1.0.” This will also ensure that future ASR programs are not trained on machine-generated subtitles. |
|
|
|
|
|
## Acknowledgements |
|
|
|
Our gratitude extends to [Google TPU Research Cloud](https://sites.research.google/trc/about/) for training resources, Google Cloud for translation credits, and HuggingFace's Sanchit Ghandi for technical support. A special thank you to Per Erik Solberg at Språkbanken for the collaboration on the Stortinget corpus. |
|
|
|
## Contact |
|
For feedback, technical concerns, or collaboration inquiries, please contact <a rel="noopener nofollow" href="mailto:ailab@nb.no">ailab@nb.no</a>. If you plan to include this model in your research, contact us for the latest information on our upcoming paper for citation purposes. |
|
|
|
|
|
## Limitations and Risks |
|
|
|
While the distilled model is efficient, users may observe: |
|
- Slight performance drops compared to the original large model in some edge cases. |
|
- Potential biases or transcription inaccuracies inherited from the training data. |
|
|
|
Users are advised to evaluate the model for their specific use cases and mitigate risks as needed. |
|
|
|
--- |
|
|
|
## Citation & Contact |
|
|
|
If you use this model in your work, please cite the National Library of Norway. For more information or inquiries, contact [ailab@nb.no](mailto:ailab@nb.no). |