Sin2pi
/

FAMOpimizer

Model card Files Files and versions Community

FAMOpimizer / README.md

Sin2pi's picture

Update README.md

48dd8f0 verified 1 day ago

|

history blame contribute delete

2.7 kB

	---
	license: mit
	base_model:
	- openai/whisper-large-v3-turbo
	tags:
	- asr
	- optimizer
	- speech
	- audio
	- frequency
	---

	--Proof of concept-- in Beta... or theta.

	This optimizer is specifically for ASR type models but works well without the FAM which can be controlled (turned on) by step count. fam_start_step=100.

	An experimental approach specifically designed for speech recognition tasks, FAM adapts momentum based on the frequency characteristics of gradient updates.

	### Frequency-Adaptive Momentum (FAM)

	#### Core Concept

	- Speech signals possess an inherent frequency structure, with different parts of the model responding to various frequency bands. This frequency structure remains preserved, albeit transformed, when converted to log-mel spectrograms, with model parameters adapting to capture this structure.
	- The Chain of Frequency Information: Original Audio → Log-Mel Spectrogram → Encoder Parameters → Gradient Updates.
	- Empirical observations reveal that transformer-based speech models develop:
	- Lower encoder layers with filters responsive to specific frequency bands in the mel spectrogram.
	- Attention heads tracking particular acoustic patterns over time.
	- A hierarchical representation from acoustic features to phonetic units to words.
	- FAM aims to integrate a momentum scheme that adapts based on the "frequency signature" of gradient updates.

	#### Why This Optimizer Makes Sense

	FAM acknowledges the frequency structure within the optimization process itself, recognizing that:
	- Gradient Frequencies Matter: The Fourier transform of gradient updates reveals patterns linked to the model's current learning phase.
	- Different Parameters Process Different Bands: Similar to how our ears have frequency-specific receptors, different parts of the model specialize in various acoustic frequencies.
	- Temporal Structure in Learning: Speech learning progresses through stages - from basic acoustics to phonetic patterns to linguistic structures.

	By applying distinct momentum factors to different frequency bands in parameter space, FAM provides the optimizer with domain-specific audio information that it otherwise wouldn't have.

	download and test it for free! :D

	https://github.com/sine2pi/FAMOptimizer

	Usage example
	```python
	param_groups = get_parameter_groups(model=model, lr=0.001, weight_decay=1e-6)

	optimizer = FAMOptimizer(
	params=param_groups,
	beta=0.99,
	n_bands=10,
	fam_start_step=100,
	layer_boost=True,
	min_size=128,
	debug=True,
	weight_decay=0.0025,
	lr=0.001,
	)

	scheduler = FAMScheduler(
	optimizer=optimizer,
	warmup_steps=100,
	total_steps=10000,
	decay_start_step=100
	)
	```