import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Main Title st.markdown('
HuBERT for Speech Recognition
', unsafe_allow_html=True) # Introduction st.markdown("""

HuBERT (Hidden-Unit BERT) is a self-supervised speech representation model introduced in the paper HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu et al. It tackles challenges in speech representation by predicting hidden units derived from clustered speech features, enabling the model to learn acoustic and language representations from unsegmented and unannotated audio data.

""", unsafe_allow_html=True) # Why, Where, and When to Use HuBERT st.markdown('
Why, Where, and When to Use HuBERT
', unsafe_allow_html=True) # Explanation Section st.markdown("""

HuBERT is particularly useful in scenarios where high-quality speech-to-text conversion is required and where there is a need for robust speech representation learning. The model’s design makes it suitable for tasks where data may be noisy or unannotated. Key use cases include:

""", unsafe_allow_html=True) # Use Cases Section st.markdown('
Use Cases
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) # How to Use the Model st.markdown('
HuBERT Pipeline in Spark NLP
', unsafe_allow_html=True) st.markdown("""

To use the HuBERT model in Spark NLP, follow the example code below. This code demonstrates how to assemble audio data and apply the HubertForCTC annotator to convert speech to text.

""", unsafe_allow_html=True) st.code(''' audio_assembler = AudioAssembler()\\ .setInputCol("audio_content")\\ .setOutputCol("audio_assembler") speech_to_text = HubertForCTC.pretrained("asr_hubert_large_ls960", "en")\\ .setInputCols("audio_assembler")\\ .setOutputCol("text") pipeline = Pipeline(stages=[ audio_assembler, speech_to_text, ]) pipelineModel = pipeline.fit(audioDf) pipelineDF = pipelineModel.transform(audioDf) ''', language='python') # Model Information st.markdown('
Model Information
', unsafe_allow_html=True) st.markdown("""
Attribute Description
Model Name asr_hubert_large_ls960
Compatibility Spark NLP 4.3.0+
License Open Source
Edition Official
Input Labels [audio_assembler]
Output Labels [text]
Language en
Size 1.5 GB
""", unsafe_allow_html=True) # Data Source Section st.markdown('
Data Source
', unsafe_allow_html=True) st.markdown("""

The HuBERT model is available on Hugging Face. It was fine-tuned on 960 hours of Librispeech data and is optimized for 16kHz sampled speech audio. Ensure your input audio is sampled at the same rate for optimal performance.

""", unsafe_allow_html=True) # Conclusion st.markdown('
Conclusion
', unsafe_allow_html=True) st.markdown("""

HuBERT offers a powerful solution for self-supervised speech recognition, especially in challenging audio environments. Its ability to learn from unannotated data and predict masked speech units makes it a robust model for various speech-related tasks. Integrated into Spark NLP, HuBERT is ready for large-scale deployment, supporting a wide range of applications from transcription to feature extraction.

If you’re working on speech recognition projects that require resilience to noise and variability, HuBERT provides an advanced, scalable option.

""", unsafe_allow_html=True) # References st.markdown('
References
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) # Community & Support st.markdown('
Community & Support
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True)