Would you be willing to do this on a smaller model?
As above, for people like myself who cannot seem to get Llamacpp working on Windows? Say on Mistral small 22b, Qwen 32b or similar around the 20-35b range?
Cheers
Fancy seeing you here ;D
Fancy seeing you here ;D
Hehehe
I do plan to try other models, yeah.
Qwen 32b
I actually gave Qwen-2.5-72b a try, but the result wasn't so good. I suspect the Qwen models haven't been trained on as many books as Mistral and Command-R. So the resulting model couldn't easily write in the style of .
Mistral small 22b
I'm actually working on a 22b at the moment, experimental/custom architecture. If it's coherent enough, I'll release it.
Other than that I was planning to try gemma2-27b as it can be cohered into writing pretty well, though it only has an 8192 context.
cannot seem to get Llamacpp working on Windows
This should be fixable by the way. I haven't got a windows machine myself, but I'm pretty sure llama.cpp can run on pretty much anything. There were some big changes with llama.cpp earlier this month where they deprecated make
, and it was a bit unstable for a week after that. But I'm betting the latest main branch will build fine with cmake
I do plan to try other models, yeah.
Qwen 32b
I actually gave Qwen-2.5-72b a try, but the result wasn't so good. I suspect the Qwen models haven't been trained on as many books as Mistral and Command-R. So the resulting model couldn't easily write in the style of .
Mistral small 22b
I'm actually working on a 22b at the moment, experimental/custom architecture. If it's coherent enough, I'll release it.
Other than that I was planning to try gemma2-27b as it can be cohered into writing pretty well, though it only has an 8192 context.
cannot seem to get Llamacpp working on Windows
This should be fixable by the way. I haven't got a windows machine myself, but I'm pretty sure llama.cpp can run on pretty much anything. There were some big changes with llama.cpp earlier this month where they deprecated
make
, and it was a bit unstable for a week after that. But I'm betting the latest main branch will build fine withcmake
That sounds great I look forward to seeing whatever you release.
I tried a few months ago and had some luck with LlamaCPP In that I got it to load a model but I just couldn't get it to work with CUDA/CuBLAS no matter what I did. I'll be honest though I am really strapped for time having 2 young kids and working full time. But I'll give it another gom if you know of any decent guides out there (unlike as you don't use Windows) let me know.
Anyway Merry Christmas!
I've released the 22b I mentioned.
It's quite experimental though, might not be what you're looking for.
(It's actually based on the WizardLM2-8x22b, Mixtral-8x22b architecture rather than Mistral-Small)
It includes the same dataset as this model, but I've branded it as "RP" because it's not as knowledgeable as the 123b model here and can be unstable at short contexts (where as RP usually has at least a few hundred tokens of "Lore" before the first message).
I also did a brief attempt at a Qwen2.5-32b with the same dataset as this model (Writer-Large-2411) but Qwen but it turned out quite bland compared to Mistral-based finetunes.
I don't think Qwen was trained on as many books as Mistral.
RE llama.cpp, I don't suppose their pre-build windows/cuda binaries work?
https://github.com/ggerganov/llama.cpp/releases
llama-b4402-bin-win-cuda-cu11.7-x64.zip
llama-b4402-bin-win-cuda-cu12.4-x64.zip
Anyway, Happy New Year!
I've released the 22b I mentioned.
It's quite experimental though, might not be what you're looking for.
(It's actually based on the WizardLM2-8x22b, Mixtral-8x22b architecture rather than Mistral-Small)It includes the same dataset as this model, but I've branded it as "RP" because it's not as knowledgeable as the 123b model here and can be unstable at short contexts (where as RP usually has at least a few hundred tokens of "Lore" before the first message).
I also did a brief attempt at a Qwen2.5-32b with the same dataset as this model (Writer-Large-2411) but Qwen but it turned out quite bland compared to Mistral-based finetunes.
I don't think Qwen was trained on as many books as Mistral.RE llama.cpp, I don't suppose their pre-build windows/cuda binaries work?
https://github.com/ggerganov/llama.cpp/releases
llama-b4402-bin-win-cuda-cu11.7-x64.zip
llama-b4402-bin-win-cuda-cu12.4-x64.zip
Anyway, Happy New Year!
That's awesome man! I've used WizardLM 8x22b on OpenRouter and loved it. So this could be well up my street. I'll give the binaries a go again. Thanks mate and happy new year to you!
So I gave it a go and you are correct it does seem pretty broken, sadly.
I'm not sure if this will help but the guy who does the dolphin models did a similar tune you could try and work with?
cognitivecomputations/dolphin-2.9.1-mixtral-1x22b
I'm looking to do some tunes myself but I still need to get over my fear of walls of text and command line stuff (ADHD lol) and getting the time. No idea where to start either.
So I gave it a go and you are correct it does seem pretty broken, sadly.
Right. It's quite sensitive to samplers but even when I get it right, there's still a chance it'll lose coherence or start swearing at me lol.
cognitivecomputations/dolphin-2.9.1-mixtral-1x22b
Thanks for the link. I probably should have gone with Mixtral-base as well, rather than wizard which is already kind of finicky with it's Vicuna template.
Looks like they left this model as a MoE, but with a single expert. As such, it needs an old version of llama.cpp to run (probably why it wasn't hugely popular / didn't get EXL2/VLLM support, and I didn't even know about it!).
I compiled that old llama.cpp and tested it, seems pretty stable.
For my next project (when I have time at the end of the month), I'm going to convert their model to the Mistral architecture and upload it. I'm confident it'll remain just as stable / coherent, but it'll be compatible with ALL the tooling built around Mistral!
After that, I/we (the community) should be able to use it as a base for training, and produce Apache2.0 licensed 22b models!
cognitivecomputations have already trained it on a lot of data with those 8xH100's and taught it the ChatML template (which gives us system prompts, multi-turn conversations, etc).
As such, you have to compile an older version of llama.cpp to run it. It's definitely more stable than mine.
I think going with Wizard instead of Mixtral was a bit ambitious given my limited budget.
I'm looking to do some tunes myself but I still need to get over my fear of walls of text and command line stuff (ADHD lol) and getting the time. No idea where to start either.
In that case, I'd suggest starting with unsloth
in google colab. They provide example notebooks which you can run, without using the CLI. Notebooks break the code up into "Cells" you can run and modify at will, so you won't have a 300 line python script to manage.
Google Colab is free if you have a google/gmail account, and you can save the notebooks into google drive, and only upload the artifacts (LoRAs, datasets and models) to huggingface.
https://docs.unsloth.ai/get-started/unsloth-notebooks
(I'd start with small instruct models like Mistral-Nemo-Instruct, Gemma-2-9b-Instruct and if you want to try vision, Qwen2-7b-VL-Instruct)
All data left behind on the colab instance gets deleted, so you won't end up with an SSD full of 80%-finished projects and broken conda environments :)
All data left behind on the colab instance gets deleted, so you won't end up with an SSD full of 80%-finished projects and broken conda environments :)
Sounds like someone speaking from personal experience....
So I gave it a go and you are correct it does seem pretty broken, sadly.
Right. It's quite sensitive to samplers but even when I get it right, there's still a chance it'll lose coherence or start swearing at me lol.
cognitivecomputations/dolphin-2.9.1-mixtral-1x22b
Thanks for the link. I probably should have gone with Mixtral-base as well, rather than wizard which is already kind of finicky with it's Vicuna template.
Looks like they left this model as a MoE, but with a single expert. As such, it needs an old version of llama.cpp to run (probably why it wasn't hugely popular / didn't get EXL2/VLLM support, and I didn't even know about it!).
I compiled that old llama.cpp and tested it, seems pretty stable.
For my next project (when I have time at the end of the month), I'm going to convert their model to the Mistral architecture and upload it. I'm confident it'll remain just as stable / coherent, but it'll be compatible with ALL the tooling built around Mistral!
After that, I/we (the community) should be able to use it as a base for training, and produce Apache2.0 licensed 22b models!
cognitivecomputations have already trained it on a lot of data with those 8xH100's and taught it the ChatML template (which gives us system prompts, multi-turn conversations, etc).As such, you have to compile an older version of llama.cpp to run it. It's definitely more stable than mine.
I think going with Wizard instead of Mixtral was a bit ambitious given my limited budget.
I'm looking to do some tunes myself but I still need to get over my fear of walls of text and command line stuff (ADHD lol) and getting the time. No idea where to start either.
In that case, I'd suggest starting with
unsloth
in google colab. They provide example notebooks which you can run, without using the CLI. Notebooks break the code up into "Cells" you can run and modify at will, so you won't have a 300 line python script to manage.Google Colab is free if you have a google/gmail account, and you can save the notebooks into google drive, and only upload the artifacts (LoRAs, datasets and models) to huggingface.
https://docs.unsloth.ai/get-started/unsloth-notebooks
(I'd start with small instruct models like Mistral-Nemo-Instruct, Gemma-2-9b-Instruct and if you want to try vision, Qwen2-7b-VL-Instruct)
All data left behind on the colab instance gets deleted, so you won't end up with an SSD full of 80%-finished projects and broken conda environments :)
I still think it has some potential though, maybe more fine tuning?
Definitely an interesting model though a and fun experiment.
If you went the mistral route, I guess you could use the Wizard dataset and use the Mistrral instruct template to to train it? I dunno I'm no where near as knowledgeable as you.
Thannks for all this, you an the other guys on here are doing great work and doing it all for free and now even passing knowledge on!
I have looked at Unsolth, I have considered getting a few books I like, converting them to text and doing some text completion training done. It would have to be a private model due to copywrite.
I still think it has some potential though, maybe more fine tuning?
Oh yeah, I haven't given up yet. I have some more ideas to try for sure. I should have more $ for GPU instances, and free time, towards the end of the month.
I guess you could use the Wizard dataset and use the Mistral instruct template to train it?
The WizardLM2 datasets weren't released, and even if they were, that'd cost more than I'll make in my lifetime to train :D
I still think I can further heal this one. It'd be easier if I did what dolphin did and kept the router + MoE architecture, but I’m really keen to have this as a proper dense model with the Mistral architecture.
I have looked at Unsolth, I have considered getting a few books I like, converting them to text and doing some text completion training done. It would have to be a private model due to copywrite.
It’s worth checking out this project then. I’ve used it for generating RP datasets before. It’s got a feature to convert manual/books into pertaining data.
https://github.com/e-p-armstrong/augmentoolkit
Thannks for all this, you an the other guys on here are doing great work and doing it all for free and now even passing knowledge on!
No worries :)