Update README.md

2bdcdc1 verified 28 days ago

8.24 kB

	---
	license: apache-2.0
	language:
	- 'no'
	- nb
	- nn
	- en
	datasets:
	- NbAiLab/ncc_speech
	- NbAiLab/NST
	- NbAiLab/NPSC
	base_model: NbAiLab/nb-whisper-large-distil-turbo-beta
	tags:
	- audio
	- asr
	- automatic-speech-recognition
	metrics:
	- wer
	- cer
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	widget:
	- src: https://datasets-server.huggingface.co/assets/google/fleurs/--/nb_no/train/1/audio/audio.mp3
	example_title: FLEURS sample 1
	- src: https://datasets-server.huggingface.co/assets/google/fleurs/--/nb_no/train/4/audio/audio.mp3
	example_title: FLEURS sample 2
	---



	# NB-Whisper Large Distilled

	Introducing _NB Whisper Large Distil Turbo_, a lighter, faster version of the Norwegian ASR model developed by the National Library of Norway. This distilled model maintains strong transcription quality while being optimized for resource-constrained environments.
	It is derived from the original NB-Whisper Large model through a distillation process, reducing the number of parameters while preserving performance for Automatic Speech Recognition (ASR) tasks.

	---

	## Model Summary

	- Model Size: Reduced from 1550M parameters (Large) to 756M parameters(distilled).
	- Languages Supported: Norwegian (Bokmål).
	- Base Model: Derived from [NbAiLab/nb-whisper-large](https://huggingface.co/NbAiLab/nb-whisper-large).
	- License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

	### Key Features:
	- Speed: Faster inference with reduced computational requirements, suitable for edge devices.
	- Lightweight: Ideal for applications requiring lower memory usage.
	- Accuracy Retention: Maintains competitive performance in word error rate (WER) and character error rate (CER) benchmarks.

	---

	## Training and Distillation Details

	- Distillation Process: The model was distilled from the NB-Whisper Large model using a teacher-student framework to reduce model size while minimizing loss of accuracy.
	- Datasets Used: The same high-quality datasets as the original model, including:
	- NbAiLab/ncc_speech
	- NbAiLab/NST
	- NbAiLab/NPSC
	- Training Steps: Distillation involved several iterations of fine-tuning to achieve an optimal balance of size and performance.

	---

	## How to Use


	### Local Setup
	To run locally, install the necessary libraries and use the Transformers pipeline:

	\`\`\`bash
	pip install transformers>=4.35.2
	\`\`\`

	\`\`\`python
	from transformers import pipeline

	# Load the distilled model
	asr = pipeline("automatic-speech-recognition", "NbAiLab/NB-Whisper-Large-destil-Turbo-beta")

	# Transcribe audio
	result = asr("example_audio.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'})
	print(result["text"])
	\`\`\`

	---

	## Performance

	The distilled model achieves similar results to the full NB-Whisper Large model in many scenarios but is optimized for speed and resource efficiency. It is ideal for real-time applications such as live transcription or mobile usage.

	### Example Use Cases:
	- Real-time transcription on low-resource devices.
	- Speech analysis in applications requiring low-latency responses.
	- Edge deployment in mobile or embedded systems.

	---

	### API
	Instructions for accessing the models via a simple API are included in the demos under Spaces. Note that these demos are temporary and will only be available for a few weeks.

	## Training Data
	The training data originates from Språkbanken and the National Library of Norway's digital collection, including:

	- NST Norwegian ASR Database (16 kHz) and its corresponding dataset
	- Transcribed speeches from the Norwegian Parliament by Språkbanken
	- TV broadcast (NRK) subtitles (NLN digital collection)
	- Audiobooks (NLN digital collection)

	## Downstream Use

	The models, especially the smaller ones, may exhibit occasional hallucinations and may drop parts of the transcript. They are designed to convert spoken language into grammatically correct written sentences, which might not always be word-for-word translations. We have made two extra model variant for users that want a different transcription style. We encourage users to try the models themselves to get a better understanding.

	## Bias, Risks, and Limitations

	Using these models without adequate risk assessment and mitigation could be considered irresponsible. They may contain biases or other undesirable distortions. Users who deploy these models or integrate them into systems or services are responsible for mitigating risks and complying with applicable AI regulations. The National Library of Norway, as the model owner, disclaims liability for any outcomes resulting from third-party use of these models.

	### Software
	The model was trained using Jax/Flax and converted to PyTorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under `Files and versions`. We welcome requests for conversion to other formats. All training code and scripts are released under the Apache License 2.0 in the GitHub repository [nb-whisper](https://github.com/NbAiLab/nb-whisper/).

	## Citation & Contributors
	The NB-Whisper Large model is a product of the NoSTram project led by Per Egil Kummervold ([@pere](https://huggingface.co/pere)) at the National Library of Norway. Key contributors include Javier de la Rosa ([@versae](https://huggingface.co/versae)), Freddy Wetjen ([@freddyw](https://huggingface.co/freddyw)), and Rolv-Arild Braaten ([@Rolv-Arild](https://huggingface.co/Rolv-Arild)). NB AI-Lab, under the direction of Svein Arne Brygfjeld ([@Brygfjeld](https://huggingface.co/Brygfjeld)), supported the project's successful completion. A detailed paper on our process and findings is forthcoming.

	## Disclaimer

	The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions. When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of artificial intelligence. In no event shall the owner of the models (The National Library of Norway) be liable for any results arising from the use made by third parties of these models.


	## Attribution

	This model is released under the Apache-2.0 license. Note that for downloads made in Norway, the requirements for attribution specified in the Norwegian copyright act still apply where relevant, even if not explicitly mentioned in the Apache License. Although attribution might not be required if the model is downloaded and used in other countries, we strongly encourage following the practive of marking subtitles with “Undertekster generert av NB-Whisper Medium v1.0” or “Subtitles generated by NB-Whisper Medium v1.0.” This will also ensure that future ASR programs are not trained on machine-generated subtitles.


	## Acknowledgements

	Our gratitude extends to [Google TPU Research Cloud](https://sites.research.google/trc/about/) for training resources, Google Cloud for translation credits, and HuggingFace's Sanchit Ghandi for technical support. A special thank you to Per Erik Solberg at Språkbanken for the collaboration on the Stortinget corpus.

	## Contact
	For feedback, technical concerns, or collaboration inquiries, please contact <a rel="noopener nofollow" href="mailto:ailab@nb.no">ailab@nb.no</a>. If you plan to include this model in your research, contact us for the latest information on our upcoming paper for citation purposes.


	## Limitations and Risks

	While the distilled model is efficient, users may observe:
	- Slight performance drops compared to the original large model in some edge cases.
	- Potential biases or transcription inaccuracies inherited from the training data.

	Users are advised to evaluate the model for their specific use cases and mitigate risks as needed.

	---

	## Citation & Contact

	If you use this model in your work, please cite the National Library of Norway. For more information or inquiries, contact [ailab@nb.no](mailto:ailab@nb.no).