Stas Bekman

stas

https://stasosphere.com/machine-learning/

AI & ML interests

Toolmaker. Software creator, optimizer and harmonizer. Makes things work and fly at Contextual.AI Training LLM/RAG/Generative AI/Machine Learning/Scalability

Recent Activity

updated a model 8 days ago

stas/ml-engineering-book

posted an update about 2 months ago

If you remember my work on MAMF - to find the realistic TFLOPS achievable ceiling - the Intel AI team has shared their measurements and they scored ... an incredible 99.4% TFLOPS efficiency for Gaudi 2! That's quite amazing! Your ROI on these accelerators will be very high. The full table is here: https://github.com/stas00/ml-engineering/tree/master/compute/accelerator#maximum-achievable-matmul-flops-comparison-table As we have seen the competitors get their achievable efficiency worse with each new generation, I'm looking forward to see if Gaudi 3 will keep the high bar! Thanks to Avi Rubin, Lakshman Chari, Imtiaz Sajwani, Ramy J and Zhiqi Tao for helping to get these numbers to the community.

updated a model 3 months ago

stas/ml-engineering-book

View all activity

Articles

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Jun 13, 2024

• 45

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 29

Organizations

Posts 7

Post

1204

If you remember my work on MAMF - to find the realistic TFLOPS achievable ceiling - the Intel AI team has shared their measurements and they scored ...

an incredible 99.4% TFLOPS efficiency for Gaudi 2!

That's quite amazing! Your ROI on these accelerators will be very high.

The full table is here: https://github.com/stas00/ml-engineering/tree/master/compute/accelerator#maximum-achievable-matmul-flops-comparison-table

As we have seen the competitors get their achievable efficiency worse with each new generation, I'm looking forward to see if Gaudi 3 will keep the high bar!

Thanks to Avi Rubin, Lakshman Chari, Imtiaz Sajwani, Ramy J and Zhiqi Tao for helping to get these numbers to the community.

Post

1153

The Universal Checkpointing paper is out! https://arxiv.org/abs/2406.18820

If you remember the Bigscience BLOOM-176B training, Tunji Ruwase and I co-invented this technology for Megatron-Deepspeed in order to enable to quickly scale up and down node topology while continuing training.

Since then the DeepSpeed team continued improving on that and it has now been fully integrated into Deepspeed.

The blog post is here: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-ucp/README.md

View all posts