Model

This repository contains the second version of our Automatic Speech Recognition and Subtitle Generation model, with improved architecture and trained on 14000 hours of Flemish broadcast subtitled speech data. It can generate both an exact verbatim transcription with annotation tags as well as a fully formatted and cleaned up subtitle transcription. It outputs both modalities with separate decoders.

This repository contains the large variant of the model with 180M parameters.

Version: April 2024

Usage

This repository only hosts the pre-trained model itself and the configuration files. To download this model, see the instructions here.

Usage of this model, as well as our other ASR models, is integrated in our Github codebase. Please refer to the Github for installation.

Webservice

This model can also be accessed through the webservice of the NeLF Project. After requesting access, you can upload audio or video files and they will be transcribed according to the desired settings.

Citation

If you use this model, please cite the research paper:

@article{poncelet2024,
    author = "Poncelet, Jakob and Van hamme, Hugo",
    title = "Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling",
    year={2024},
    journal={arXiv preprint arXiv:2502.03212},
    url = {https://arxiv.org/abs/2502.03212}

Contact

Jakob Poncelet: jakob.poncelet@kuleuven.be