--- license: mit base_model: - openai/whisper-large-v3-turbo tags: - asr - optimizer - speech - audio - frequency --- --Proof of concept-- in Beta... or theta. This optimizer is specifically for ASR type models but works well without the FAM which can be controlled (turned on) by step count. fam_start_step=100. An experimental approach specifically designed for speech recognition tasks, FAM adapts momentum based on the frequency characteristics of gradient updates. ### Frequency-Adaptive Momentum (FAM) #### Core Concept - Speech signals possess an inherent frequency structure, with different parts of the model responding to various frequency bands. This frequency structure remains preserved, albeit transformed, when converted to log-mel spectrograms, with model parameters adapting to capture this structure. - The Chain of Frequency Information: Original Audio → Log-Mel Spectrogram → Encoder Parameters → Gradient Updates. - Empirical observations reveal that transformer-based speech models develop: - Lower encoder layers with filters responsive to specific frequency bands in the mel spectrogram. - Attention heads tracking particular acoustic patterns over time. - A hierarchical representation from acoustic features to phonetic units to words. - FAM aims to integrate a momentum scheme that adapts based on the "frequency signature" of gradient updates. #### Why This Optimizer Makes Sense FAM acknowledges the frequency structure within the optimization process itself, recognizing that: - **Gradient Frequencies Matter:** The Fourier transform of gradient updates reveals patterns linked to the model's current learning phase. - **Different Parameters Process Different Bands:** Similar to how our ears have frequency-specific receptors, different parts of the model specialize in various acoustic frequencies. - **Temporal Structure in Learning:** Speech learning progresses through stages - from basic acoustics to phonetic patterns to linguistic structures. By applying distinct momentum factors to different frequency bands in parameter space, FAM provides the optimizer with domain-specific audio information that it otherwise wouldn't have. download and test it for free! :D https://github.com/sine2pi/FAMOptimizer Usage example ```python param_groups = get_parameter_groups(model=model, lr=0.001, weight_decay=1e-6) optimizer = FAMOptimizer( params=param_groups, beta=0.99, n_bands=10, fam_start_step=100, layer_boost=True, min_size=128, debug=True, weight_decay=0.0025, lr=0.001, ) scheduler = FAMScheduler( optimizer=optimizer, warmup_steps=100, total_steps=10000, decay_start_step=100 ) ```