Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.
I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.
Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" π€
The steps are simple:
For each round:
1> Starting by sampling 2 starting noises with different seeds. 2> Score the generations w.r.t a metric. 3> Obtain the best generation from the current round.
If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.
This constitutes the random search method as done in the paper by Google DeepMind.
The conclusion is interesting: "Our findings highlight that the Gaudi 2, by leveraging FP8, achieves higher throughput-to-power efficiency during LLM inference"
One aspect of AI hardware accelerators that is often overlooked is how they consume less energy than GPUs. It's nice to see researchers starting carrying out experiments to measure this!
We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.
We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links: