@m-ric on Hugging Face: "🚀 𝗪𝗵𝗲𝗿𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝗹𝗮𝘄𝘀 𝗮𝗿𝗲 𝘁𝗮𝗸𝗶𝗻𝗴 𝘂𝘀 : 𝗯𝘆 𝟮𝟬𝟮𝟴…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

m-ric

posted an update Sep 5, 2024

Post

847

🚀 𝗪𝗵𝗲𝗿𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝗹𝗮𝘄𝘀 𝗮𝗿𝗲 𝘁𝗮𝗸𝗶𝗻𝗴 𝘂𝘀 : 𝗯𝘆 𝟮𝟬𝟮𝟴, 𝗔𝗜 𝗖𝗹𝘂𝘀𝘁𝗲𝗿𝘀 𝘄𝗶𝗹𝗹 𝗿𝗲𝗮𝗰𝗵 𝘁𝗵𝗲 𝗽𝗼𝘄𝗲𝗿 𝗰𝗼𝗻𝘀𝘂𝗺𝗽𝘁𝗶𝗼𝗻 𝗼𝗳 𝗲𝗻𝘁𝗶𝗿𝗲 𝗰𝗼𝘂𝗻𝘁𝗿𝗶𝗲𝘀

Reminder : “Scaling laws” are empirical laws saying that if you keep multiplying your compute by x10, your models will mechanically keep getting better and better.

To give you an idea, GPT-3 can barely write sentences, and GPT-4, which only used x15 its amount of compute, already sounds much smarter than some of my friends (although it's not really - or at least I haven't tested them side-by side). So you can imagine how far a x100 over GPT-4 can take us.

🏎️ As a result, tech titans are racing to build the biggest models, and for this they need gigantic training clusters.

The picture below shows the growth of training compute: it is increasing at a steady exponential rate of a x10 every 2 years. So let’s take this progress a bit further:
- 2022: starting training for GPT-4 : 10^26 FLOPs, cost of $100M
- 2024: today, companies start training on much larger clusters like the “super AI cluster” of Elon Musk’s xAI, 10^27 FLOPS, $1B
- 2026 : by then clusters will require 1GW, i.e. around the full power generated by a nuclear reactor
- 2028: we reach cluster prices in the 100 billion dollars, using 10GW, more than the most powerful power stations currently in use in the US. This last size seems crazy, but Microsoft and OpenAI already are planning one.

Will AI clusters effectively reach these crazy sizes where the consume as much as entire countries?
➡️ Three key ingredients of training might be a roadblock to scaling up :
💸 Money: but it’s very unlikely, given the potential market size for AGI, that investors lose interest.
⚡️ Energy supply at a specific location
📚 Training data: we’re already using 15 trillion tokens for Llama-3.1 when Internet has something like 60 trillion.

🤔 I’d be curious to hear your thoughts: do you think we’ll race all the way there?

lamhieu

Sep 5, 2024

Sounds interesting but I think there will be a big breakthrough, a new "architecture/methodology/factor/rethinking" for developing large models. That's what I think, I don't know what it is yet, haha.

MANOFAi94

Sep 6, 2024

•

edited Sep 6, 2024

I think in near future we won't be able to use ai locally on our pc cause the models will be so big and use lots of energy. Or some one will find a way to make powerfully small models that use less data but are 10× better just like loras u only need about (correct me if I'm wrong) 15-20 Images to make a model. If we could get small models checkpoints like that that can build images based off of a few images just as good as flux1 this would be a game changer!! Ik someone is mart enough to turn loras Into checkpoints or make something called minitensors:> like basic lyrics u would train the model on tokens not images like say if u type bird it already knows what a bird is and looks like so when u put images to train the style, if u get what I'm saying:> so u would only need 1 image I 🤔

StephenGenusa

Sep 6, 2024

•

edited Sep 6, 2024

I think there will be a big breakthrough as well, but I'd be surprised if it happens soon. If it does, I'd be happy. While the architectures of LLMs continue to advance I don't see any evidence that significant progress is being made and I personally think the architectures are too primitive and inherently self-limiting. I am also a believer that bigger does not necessarily mean better. I think we've reached the limits or are near the point of reaching the limits of where size dictates how powerful the LLM is.

Therefore, I think, given the current architectural limitations, the external limits, namely those dictated by power availability, and the many resources/costs of building better LLMs, will slow AI development until a radical change comes along.

We've managed to survive without them and now that we have them, they are a great step forward and we'll continue using and improving what we have. There are many improvements that can be made around the LLM using NLP to improve what we expect from LLMs and that's where the focus will turn for the time being, such as xLLM. Better architectures are going to have to take into account the difference in statistical models of representations of the world and the way humans communicate through speech and writing.

In this post