Dokyoon

leeloolee

AI & ML interests

ai

Recent Activity

upvoted a paper 9 days ago
GUI Agents: A Survey
reacted to m-ric's post with šŸ‘ 15 days ago
š‡š®š š š¢š§š  š…šššœšž š«šžš„šžššš¬šžš¬ šš¢šœšØš­š«šØš§, šš š¦š¢šœš«šØš¬šœšØš©š¢šœ š„š¢š› š­š”ššš­ š¬šØš„šÆšžš¬ š‹š‹šŒ š­š«ššš¢š§š¢š§š  šŸ’šƒ š©ššš«ššš„š„šžš„š¢š³ššš­š¢šØš§ šŸ„³ šŸ•°ļø Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. šŸ‘“šŸ» If they had needed all this time, we would have GPU stories from the time of Pharaoh š“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " šŸ› ļø But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. šŸ¤ š—•š˜‚š˜ š—»š—¼š˜„ š˜„š—² š—±š—¼š—»'š˜ š—»š—²š—²š—± š—µš˜‚š—“š—² š—暝—²š—½š—¼š˜€ š—®š—»š˜†š—ŗš—¼š—暝—²! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! āš” š—œš˜'š˜€ š˜š—¶š—»š˜†, š˜†š—²š˜ š—½š—¼š˜„š—²š—暝—³š˜‚š—¹: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look šŸ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
View all activity

Organizations

sionic-ai's profile picture MultišŸ¤–Transformers's profile picture ģøģŠ¤ķŠøėŸ­ķŠø.ķ•œźµ­'s profile picture AI Safeguard's profile picture

leeloolee's activity

upvoted an article about 1 month ago
view article
Article

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

By mikelabs ā€¢
ā€¢ 11
upvoted an article about 1 month ago
view article
Article

Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK

ā€¢ 35
upvoted an article 2 months ago
view article
Article

Running Large Multimodal Models on an AI PC's NPU

By bconsolvo ā€¢
ā€¢ 14