|
--- |
|
license: mit |
|
base_model: |
|
- openai/whisper-large-v3-turbo |
|
tags: |
|
- asr |
|
- optimizer |
|
- speech |
|
- audio |
|
- frequency |
|
--- |
|
|
|
--Proof of concept-- in Beta... or theta. |
|
|
|
This optimizer is specifically for ASR type models but works well without the FAM which can be controlled (turned on) by step count. fam_start_step=100. |
|
|
|
An experimental approach specifically designed for speech recognition tasks, FAM adapts momentum based on the frequency characteristics of gradient updates. |
|
|
|
### Frequency-Adaptive Momentum (FAM) |
|
|
|
#### Core Concept |
|
|
|
- Speech signals possess an inherent frequency structure, with different parts of the model responding to various frequency bands. This frequency structure remains preserved, albeit transformed, when converted to log-mel spectrograms, with model parameters adapting to capture this structure. |
|
- The Chain of Frequency Information: Original Audio β Log-Mel Spectrogram β Encoder Parameters β Gradient Updates. |
|
- Empirical observations reveal that transformer-based speech models develop: |
|
- Lower encoder layers with filters responsive to specific frequency bands in the mel spectrogram. |
|
- Attention heads tracking particular acoustic patterns over time. |
|
- A hierarchical representation from acoustic features to phonetic units to words. |
|
- FAM aims to integrate a momentum scheme that adapts based on the "frequency signature" of gradient updates. |
|
|
|
#### Why This Optimizer Makes Sense |
|
|
|
FAM acknowledges the frequency structure within the optimization process itself, recognizing that: |
|
- **Gradient Frequencies Matter:** The Fourier transform of gradient updates reveals patterns linked to the model's current learning phase. |
|
- **Different Parameters Process Different Bands:** Similar to how our ears have frequency-specific receptors, different parts of the model specialize in various acoustic frequencies. |
|
- **Temporal Structure in Learning:** Speech learning progresses through stages - from basic acoustics to phonetic patterns to linguistic structures. |
|
|
|
By applying distinct momentum factors to different frequency bands in parameter space, FAM provides the optimizer with domain-specific audio information that it otherwise wouldn't have. |
|
|
|
download and test it for free! :D |
|
|
|
https://github.com/sine2pi/FAMOptimizer |
|
|
|
Usage example |
|
```python |
|
param_groups = get_parameter_groups(model=model, lr=0.001, weight_decay=1e-6) |
|
|
|
optimizer = FAMOptimizer( |
|
params=param_groups, |
|
beta=0.99, |
|
n_bands=10, |
|
fam_start_step=100, |
|
layer_boost=True, |
|
min_size=128, |
|
debug=True, |
|
weight_decay=0.0025, |
|
lr=0.001, |
|
) |
|
|
|
scheduler = FAMScheduler( |
|
optimizer=optimizer, |
|
warmup_steps=100, |
|
total_steps=10000, |
|
decay_start_step=100 |
|
) |
|
``` |