Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
39
13
Concedo
concedo
Follow
Lycan22's profile picture
PrimeD's profile picture
Aryanne's profile picture
67 followers
·
3 following
LostRuins
AI & ML interests
Hi, I'm Concedo, also known as LostRuins. I'm the dev of KoboldCpp.
Recent Activity
replied
to
bartowski
's
post
2 days ago
Old mixtral model quants may be broken! Recently Slaren over on llama.cpp refactored the model loader - in a way that's super awesome and very powerful - but with it came breaking of support for "split tensor MoE models", which applies to older mixtral models You may have seen my upload of one such older mixtral model, ondurbin/bagel-dpo-8x7b-v0.2, and with the newest changes it seems to be able to run without issue If you happen to run into issues with any other old mixtral models, drop a link here and I'll try to remake them with the new changes so that we can continue enjoying them :)
replied
to
bartowski
's
post
2 days ago
Looks like Q4_0_N_M file types are going away Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable) You can see the reference PR here: https://github.com/ggerganov/llama.cpp/pull/10446 So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms) As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those ! Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541 Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
updated
a model
3 days ago
koboldcpp/mmproj
View all activity
Organizations
spaces
3
Sort: Recently updated
Running
3
📈
DatasetExplorer
Sleeping
5
🚀
Web Tokenizer
Running
6
😻
Koboldcpp KobbleTiny
Kobble Tiny
models
22
Sort: Recently updated
concedo/Beepo-22B-GGUF
Updated
Nov 17
•
4.09k
•
29
concedo/Beepo-22B
Updated
Nov 17
•
239
•
46
concedo/Mini-Magnum-Unboxed-12B-GGUF
Updated
Aug 11
•
96
•
3
concedo/Mini-Magnum-Unboxed-12B
Updated
Aug 11
•
15
•
5
concedo/koboldcpp
Text Generation
•
Updated
Aug 7
•
350
•
5
concedo/KobbleSmall-2B-GGUF
Updated
Aug 5
•
21
•
3
concedo/KobbleSmall-2B
Updated
Aug 5
•
4
•
3
concedo/Phi-SoSerious-Mini-V1-GGUF
Updated
May 22
•
33
•
6
concedo/Phi-SoSerious-Mini-V1
Text Generation
•
Updated
May 22
•
11
•
8
concedo/KobbleTinyV2-1.1B-GGUF
Updated
May 3
•
4.21k
•
14
Expand 22 models
datasets
None public yet