homebrewltd/Speechless-llama3.2-v0.1

Speechless

Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.

Trained on over ~400 hours of English and ~1000 hours of Vietnamese data, Speechless is a core component of the Ichigo v0.5 family.

For more details, check out our official blog post.

Model Summary

Developed by: Homebrew Research.

Model Architecture: Llama

Model type: Text to Semantics

Language(s): English and Vietnamese

License: Apache 2.0

Resources

Blog: Blog post

Intended Use

Intended Use Cases This model is primarily designed for research purposes. This version focuses on generating direct semantic representations of audio as discrete tokens, eliminating the need for a text-to-speech (TTS) model.

Out-of-scope The use of Ichigo Whisper in any manner that violates applicable laws or regulations is strictly prohibited.

How to Get Started

You can use given example code to load the model.

import torch
from transformers import pipeline

model_id = "homebrewltd/Speechless-llama3.2-v0.1"

pipe = pipeline(
    "text-generation", 
    model=model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

pipe("<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research")

>>> [{'generated_text': '<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research.assistant\n\n<|sound_1968|><|sound_0464|><|sound_0642|><|duration_02|><|sound_0634|><|sound_0105|><|duration_02|><|sound_1745|><|duration_02|><|sound_1345|><|sound_0210|><|sound_1312|><|sound_1312|>'}]

Training Specs

Parameter	Value
Epochs	2
Global Batch Size	144
Learning Rate	3e-4
Learning Scheduler	Cosine
Optimizer	AdamW
Warmup Ratio	0.05
Weight Decay	0.01
Max Sequence Length	512
Clip Grad Norm	1.0

Evaluation

Vietnamese

Model Name	Dataset test	Test samples	WER
Speechless v0.1	viet_bud500	7500	3.99

English

Model Name	Dataset test	Test samples	WER
Speechless v0.1	librispeech_asr	2620	3.27

Citation Information

BibTeX:

@article{Speechless 2024,
  title={Speechless},
  author={Homebrew Research},
  year=2024,
  month=December},
  url={https://huggingface.co/homebrewltd/Speechless-llama3.2-v0.1}

Acknowledgement

WhisperSpeech
Llama3.2

homebrewltd
/

Speechless-llama3.2-v0.1