Speechless
Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.
Trained on over ~400 hours of English and ~1000 hours of Vietnamese data, Speechless is a core component of the Ichigo v0.5 family.
For more details, check out our official blog post.
Model Summary
Developed by: Homebrew Research.
Model Architecture: Llama
Model type: Text to Semantics
Language(s): English and Vietnamese
License: Apache 2.0
Resources
Blog: Blog post
Intended Use
Intended Use Cases This model is primarily designed for research purposes. This version focuses on generating direct semantic representations of audio as discrete tokens, eliminating the need for a text-to-speech (TTS) model.
Out-of-scope The use of Ichigo Whisper in any manner that violates applicable laws or regulations is strictly prohibited.
How to Get Started
You can use given example code to load the model.
import torch
from transformers import pipeline
model_id = "homebrewltd/Speechless-llama3.2-v0.1"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
pipe("<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research")
>>> [{'generated_text': '<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research.assistant\n\n<|sound_1968|><|sound_0464|><|sound_0642|><|duration_02|><|sound_0634|><|sound_0105|><|duration_02|><|sound_1745|><|duration_02|><|sound_1345|><|sound_0210|><|sound_1312|><|sound_1312|>'}]
Training Specs
Parameter | Value |
---|---|
Epochs | 2 |
Global Batch Size | 144 |
Learning Rate | 3e-4 |
Learning Scheduler | Cosine |
Optimizer | AdamW |
Warmup Ratio | 0.05 |
Weight Decay | 0.01 |
Max Sequence Length | 512 |
Clip Grad Norm | 1.0 |
Evaluation
- Vietnamese
Model Name | Dataset test | Test samples | WER |
---|---|---|---|
Speechless v0.1 | viet_bud500 | 7500 | 3.99 |
- English
Model Name | Dataset test | Test samples | WER |
---|---|---|---|
Speechless v0.1 | librispeech_asr | 2620 | 3.27 |
Citation Information
BibTeX:
@article{Speechless 2024,
title={Speechless},
author={Homebrew Research},
year=2024,
month=December},
url={https://huggingface.co/homebrewltd/Speechless-llama3.2-v0.1}
Acknowledgement
- Downloads last month
- 185