README.md · Sin2pi/FAMOpimizer at main

metadata

license: mit
base_model:
  - openai/whisper-large-v3-turbo
tags:
  - asr
  - optimizer
  - speech
  - audio
  - frequency

--Proof of concept-- in Beta... or theta.

This optimizer is specifically for ASR type models but works well without the FAM which can be controlled (turned on) by step count. fam_start_step=100.

An experimental approach specifically designed for speech recognition tasks, FAM adapts momentum based on the frequency characteristics of gradient updates.

Frequency-Adaptive Momentum (FAM)

Core Concept

Speech signals possess an inherent frequency structure, with different parts of the model responding to various frequency bands. This frequency structure remains preserved, albeit transformed, when converted to log-mel spectrograms, with model parameters adapting to capture this structure.
The Chain of Frequency Information: Original Audio → Log-Mel Spectrogram → Encoder Parameters → Gradient Updates.
Empirical observations reveal that transformer-based speech models develop:
- Lower encoder layers with filters responsive to specific frequency bands in the mel spectrogram.
- Attention heads tracking particular acoustic patterns over time.
- A hierarchical representation from acoustic features to phonetic units to words.
FAM aims to integrate a momentum scheme that adapts based on the "frequency signature" of gradient updates.

Why This Optimizer Makes Sense

FAM acknowledges the frequency structure within the optimization process itself, recognizing that:

Gradient Frequencies Matter: The Fourier transform of gradient updates reveals patterns linked to the model's current learning phase.
Different Parameters Process Different Bands: Similar to how our ears have frequency-specific receptors, different parts of the model specialize in various acoustic frequencies.
Temporal Structure in Learning: Speech learning progresses through stages - from basic acoustics to phonetic patterns to linguistic structures.

By applying distinct momentum factors to different frequency bands in parameter space, FAM provides the optimizer with domain-specific audio information that it otherwise wouldn't have.

download and test it for free! :D

https://github.com/sine2pi/FAMOptimizer

Usage example

param_groups = get_parameter_groups(model=model, lr=0.001, weight_decay=1e-6)

optimizer = FAMOptimizer(
    params=param_groups,
    beta=0.99,
    n_bands=10,
    fam_start_step=100,
    layer_boost=True,
    min_size=128,
    debug=True,
    weight_decay=0.0025,
    lr=0.001,
)

scheduler = FAMScheduler(
    optimizer=optimizer,
    warmup_steps=100,
    total_steps=10000,
    decay_start_step=100
)