RichardErkhov/FATLLAMA-1.7T-Instruct

#359
by RichardErkhov - opened
This comment has been hidden
RichardErkhov changed discussion status to closed
RichardErkhov changed discussion title from RichardErkhov/FATLLAMA-1.7T-Instruct to -------

Why did you remove your request for https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct? Have you tried it and where not satisfied with the model’s quality? Have you already started quantizing it yourself?

Why would anyone create FatLlama-1.7T? I mean, seriously, what’s the point? You wake up one day and think, “You know what we need? A model so massive that even the clouds get nervous.” It’s like deciding to build a rocket just to go to the grocery store.

If this beats BigLlama 1T then this would be the world’s best openly available LLM which would be a huge deal. The resources required to run such massive AI models are worth it if they are of generating output of better quality. There are many tasks where quality is way more important than quantity. Especially in the cooperate world where the cost of running such a model is almost neglectable.

Sure, it's impressive, but who’s running it? Probably not you, unless your PC is secretly a nuclear reactor

My PCs at home can likely run this at Q4_XXS as I can run 405B at F16. You really don't need that crazy hardware to run a model like this. A few old decommissioned servers you can get for cheap connected together over RPC will do. You just need around 1 TB RAM in total. All modern servers support 1 TB RAM so all you need is an ordinary server with 1 TB RAM. You could get DDR5 server RAM for $2500/TB if you get a decent deal which is not much more expensive than an RTX 4090.

And what’s it going to do? Maybe predict your emails before you even think of writing them, or just become really good at finding cat videos. The real question is: Are we creating these gigantic models because we can... or because we’ve got something to prove to the universe? At this point, it’s less AI and more “hold my beer, I’m gonna run this thing.”

If I can ask this model a question and on average get a better answer then using BigLama 1T it will be worth it. My time wasted by reading low quality answers and the resources required to regenerate bad answers is way more expensive then locally running such a massive model.

So there it is, FatLlama-1.7T, taking up all your hard drive space like it’s a vacation rental that overstays its welcome. Forget about saving family photos or, you know, literally anything else. Hope you didn’t need that 3TB of free space—you’ve got a digital behemoth now.

It will not take up more space than BigLlama 1T currently does as I would just store it at a lower quant as I only have 896 GiB of RAM so storing any model at a quant larger than that would be pointless. If it turns out to be better than BigLlama 1T it would obviously also replace it.

Quants? Yeah, good luck with that. I tried to quantize it, and my computer just laughed at me and went back to running Minesweeper. It’s like trying to shove a mattress into a filing cabinet—not happening.

We definitely could quant it if the model it is any good. It will happen if the model is any good and worth quantizing. Doing so wouldn't even take that long. Computing the imatrix would be a bit of a pain as doing so requires to actually run the model but we can do imatrix computation at a lower quant like we did for BigLlama 1T.

But hey, maybe one day someone will figure out how to get this thing slimmed down to IQ-1 quant, where it’ll finally fit on something that’s not the size of a small country’s power grid. Imagine that: running FatLlama on your home rig, like it’s no big deal. It’ll probably be the same day pigs fly, or, in this case, llamas. But until then, we’ll keep dreaming... and buying more external hard drives, because apparently, we’re all data hoarders now.

We could do IQ1 quants of it withing a day or so and as mentioning before even running Q4_XXS on my local rig should be possible right now and if you are willing to scrape together a few old decommissioned servers or spend a few grands you could obtain hardware capable of running it as well.

In the meantime, FatLlama just sits there, taunting you with its untouchable size, like that box of cookies you said you wouldn’t eat. Maybe it’ll eventually do something useful, like solve world hunger, or more realistically, it’ll just become the best meme-generator the world has ever seen. Because let’s be honest, that’s the true endgame for AI anyway—perfect memes, instantly.

We will see about that once I try it out in a few days.

Welp, if by some miracle you actually manage to get FatLlama-1.7T up and running, don’t get too comfy—because you know what's next, right? FatLlama 3T. Why? Because who doesn’t want to flex with even more ridiculous numbers? It’s like saying, “Oh, you lifted 1.7 trillion? Cute. Try 3 trillion, champ.” By the time you’re done maxing out your power grid and turning your house into a data center, I’ll be onto FatLlama 5.8T, which will probably require a small star as an energy source. Challenge accepted? Or should we just call NASA now?

Hell no. This is getting absolutely ridiculous. I'm almost certain that if you go beyond 1.7T quality will no longer improve by any meaningful way. I’m already skeptical if there is any significant improvement compared to BigLlama 1T. If you go beyond 1.7T you really start on getting to a point where most normal persons will be struggle running this on their home setup. But then again you can always buy more servers to connect them over RPC to run it anyways but it will be getting quite slow. I can definitely say that beyond 1.7T there will for sure not be any imatrix quants from us.

Lol. You can do it, I just recieved a lot of negative feedback when I told about it, decided people are just not ready for 1.7T. Model passes dummy check, so you can do gguf

RichardErkhov changed discussion status to open
RichardErkhov changed discussion title from ------- to RichardErkhov/FATLLAMA-1.7T-Instruct

What can I say, I dont have enough storage, I have a bunch of ram lol, so I guess it's now your time to shine @nicoboss

The model size is actually going to be a challenge even for us as it is uncommon to have such massive models. While there is 18 TB of M.2 storage in StormPeak most is in use. We use a 4 TB SSD (2x 2TB RAID 0) for spool which I always reseve for mradermacher to quantize. This model will likely be around 3.45 TB almost filling that. Maybe with compression it will take around 3 TB (or hopefully even less) only leaving around 1 TB empty. We have another 4 TB of HDD storage on upool currently in use for Llama 405B eval but gets empty in 1 day. We could download FATLLAMA-1.7T-Instruct to spool, make hf_to_gguf store the SOURCE GGUF on upool, delete the base model from spool and store the quants on spool as usual. But the issue with that is that upool is slow. Like 100 MB/s slow meaning every quant would probably take around 12 hours assuming llama.cpp manages to max out sequential HDD read speed. Other options would be to temporary move either the content of apool or bpool to upool so we can have another 4 TB SSD at our disposal. But before I can do so need to wait for Llama 405B eval to be completed. Or maybe I should finally move a real HDD into StormPeak that unlike the external HDD used for upool isn't total trash.

Hi @mradermacher . I started to move things from bpool to upool so you can soon store the FATLLAMA-1.7T-Instruct SOURCE GGUF on bpool. I recommend you already start downloading the model to spool but it will likely take at least 12 hours for everything from bpool to be moved to upool. I still have the Qwen 2.5 SOURCE GUUFs on upool and the large ones on cpool so you can delete them now from spool as otherwise storage will probably get too tight there. Regarding 405B eval don't worry I now mostly run this from cpool and so I will for the Qwen 2.5 series. I decided to run 405B eval over night so it should be done by tomorrow noon.

I now freed up enough space on bpool that the compressed FATLLAMA-1.7T-Instruct SOURCE GGUF should fit. I mounted it to your LXC container under /bpool. I recommend you download the model to spool, run hf_to_gguf with /bpool as output and then soft link the SOURCE GGUF from /bpool to /tmp after which you can run your static quant scripts as usual. This should make it much easier for us to deal with this massive model and process it in reasonable time.

You just need around 1 TB RAM in total.

You keep making my day, nico :-)

Anyway... yeah, not being in default locations is an issue for my scripts, so some manual work will be needed. I assume spool is actually my /? I could start downloading to /bpool right away, if the space runs out, I can resume (and 12 hours are over). Might be better than filling up / and then having to move. Also, is /bpool actually big enough?

mradermacher changed discussion status to closed

executes from disk using 1 GB/s of SSD

This is certainly what happened, but something is still strange - surely nothing is routed in userspace, i.e. skbufs and kernel code are non-swappable (and in my personal experience, the machine might seem dead and completly unresponsive, but it will still route happily, as long as no userspace decision is needed, which, at 10gbit/s, is certainly not you would do, per packet).

You must have hit the exact sweet spot for this :)

@RichardErkhov the imatrix Q2_K is incoming, and the rest (including your "requested" IQ1_S) will follow over the next days or next week or so :) From what I gather from what nico said, it might have a few issues with ortography, but is surprisingly intelligent. Guess you need to finetune it now :-)

You can watch it at http://hf.tst.eu/status.html if I haven't mentioned that yet.

@RichardErkhov there is a lovely IQ1_M available now, at the low cost of 390GB: https://huggingface.co/mradermacher/FATLLAMA-1.7T-Instruct-i1-GGUF

That's what you wanted, right? :)

@mradermacher I want IQ-1, it is -50GB model size, it owes me 50GB, try to do that please, let me know when it comes out

I can give you an empty file by quantizing everything down to zero bits, but I doubt it will satisfy. Guess we need to force @nicoboss to test the IQ1_M :)

Give 0.8 bit please

I first hardlinked IQ1_M to then realize that IQ2_XXS with 417 GiB easily fits as well so I deleted IQ1_M and hardlinked that one instead. Currently IQ2_XS is getting computed, and I have the feeling it could fit too based on my calculations but we will see. In any case now is probably a great time to try it as all imatrix tasks and eval tasks are done for today.

@nicoboss nah, the task was to test the iq1_m (and iq1_s) specifically. so you failed :-)

Sign up or log in to comment