Post
1425
From 10 Terabytes to Zero Parameter: The LLM 2.0 Revolution https://mltblog.com/4g2sKTv
LLM 2.0 has been brewing for a long time. Now it is becoming mainstream and replacing LLM 1.0, for its ability to deliver better ROI to enterprise customers, at a much lower cost. Much of the past resistance towards its adoption lied in one question: how can you possibly do better with no training, no GPU, and zero parameter? It is as if everyone believed that multi-billion parameter models are mandatory, due to a long tradition.
However, this machinery is used to train models on tasks irrelevant to the purpose, relying on self-reinforcing evaluation metrics that fail to capture desirable qualities such as depth, conciseness or exhaustivity. Not that standard LLMs are bad: I use OpenAI and Perplexity a lot for code generation, writing my investor deck, and even to answer advanced number theory questions. But their strength comes from all the sub-systems they rely upon, not from the central deep neural network. Remove or simplify that part, then you get a product far easier to maintain and upgrade, costing far less in development, and if done right, delivering more accurate results without hallucination, without prompt engineering and without the need to double-check the answers. Many times, errors are quite subtle and can be overlooked.
Good LLM 1.0 still saves a lot of time but requires significant vigilance. There is plenty of room for improvement, but more parameters and Blackbox DNNs have shown their limitations.
➡️ To read full article and learn how LLM 2.0 changes the game, see https://mltblog.com/4g2sKTv
LLM 2.0 has been brewing for a long time. Now it is becoming mainstream and replacing LLM 1.0, for its ability to deliver better ROI to enterprise customers, at a much lower cost. Much of the past resistance towards its adoption lied in one question: how can you possibly do better with no training, no GPU, and zero parameter? It is as if everyone believed that multi-billion parameter models are mandatory, due to a long tradition.
However, this machinery is used to train models on tasks irrelevant to the purpose, relying on self-reinforcing evaluation metrics that fail to capture desirable qualities such as depth, conciseness or exhaustivity. Not that standard LLMs are bad: I use OpenAI and Perplexity a lot for code generation, writing my investor deck, and even to answer advanced number theory questions. But their strength comes from all the sub-systems they rely upon, not from the central deep neural network. Remove or simplify that part, then you get a product far easier to maintain and upgrade, costing far less in development, and if done right, delivering more accurate results without hallucination, without prompt engineering and without the need to double-check the answers. Many times, errors are quite subtle and can be overlooked.
Good LLM 1.0 still saves a lot of time but requires significant vigilance. There is plenty of room for improvement, but more parameters and Blackbox DNNs have shown their limitations.
➡️ To read full article and learn how LLM 2.0 changes the game, see https://mltblog.com/4g2sKTv