Llama 3.1 model got their tokenizer_config file modified. We updated them.
GGUF already done will have old chat template inside but they still work properly.
Undi PRO
AI & ML interests
Recent Activity
Organizations
Undi95's activity
After a long wait, Ikari and me finally made a new release of our last model on NeverSleep repo: Lumimaid-v0.2
This model can be used in different size, from the small Llama-3.1-8B to the gigantic Mistral-Large-123B, finetuned by us.
Try them now!
- NeverSleep/Lumimaid-v0.2-8B
- NeverSleep/Lumimaid-v0.2-12B
- NeverSleep/Lumimaid-v0.2-70B
- NeverSleep/Lumimaid-v0.2-123B
All the datasets we used will be added and credit will be given!
For the quant, we wait for fix to be applied (https://github.com/ggerganov/llama.cpp/pull/8676)
Hope you will enjoy them!
Just curious, how much difference in intelligence do you think there would be between the 68 and 39 refusals? Would there be any reason to use the 68? More realistic characters maybe?
Thanks for all the models you've shared
Thing is modifying direction like this make perplexity higher, and output is of lower quality. So we need to find a balance, I took the two best model that got made by the script.
If you get 0 refusal for exemple, it will never refuse anything but it could break the model and make it dumb asf, and you're welcome!
Hello there, I written a wall of text and my webpage refreshed haha, so let's me summarize again.
This method is called Orthogonal Activation Steering, it come from here : https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction
Then, a demo using TransformersLens was available using a Qwen model, but the resulting model couldn't be saved : https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing#scrollTo=j7hOtw7UHXdD
Following that, wassname made a modification of this demo, and made a first script, we talked about this here : https://huggingface.co/posts/Undi95/318385306588047
The OG script isn't available because he updated it here : https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af
TransformersLens then got replaced for Baukit.
Failspy made his own notebook too, calling the method abliteration, but it's the same thing : https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated/blob/main/ortho_cookbook.ipynb
Finally, to reply to your answer, for this project I used a script from Lucyknada, with 1xH100 80GB, and I let it run for like 15 minutes before I found a direction with 36 refusal for 3000 toxic prompt.
It's easy and automatic, you can modify it easily too : https://github.com/lucyknada/baukit-modified
Dunno for Nemo.
Hope this help
New model released, my goal was to try finetune on the last Llama-3.1-8B-Instruct but not a small train, I wanted to do something useful.
One of the rare model that I didn't made for RP, or in the goal to uncensor it (but I did anyway kek).
The model was trained on 9M Claude conversations ONLY, giving him another writting style.
Undi95/Meta-Llama-3.1-8B-Claude > OG release fp32, it's the epoch 2
Undi95/Meta-Llama-3.1-8B-Claude-bf16 > Base model resharded in bf16 waiting for available quant without issues
Since it's frustrating to be censored using a local model, orthogonal activation steering was used, trying to force the model to never refuse a prompt.
Undi95/Meta-Llama-3.1-8B-Claude-68fail-3000total > Uncensored model, refuse 68 times on 3000 toxic prompt
Undi95/Meta-Llama-3.1-8B-Claude-39fail-3000total > Uncensored model, refuse 39 times on 3000 toxic prompt
It still refuse some prompt but the majority of them is uncensored. OAS can make a model more dumb or make the base perplexity go higher, so I didn't snipe for 0 refusal.
I don't do non-RP model a lot so any feedback is welcome, I would like to re-use this base for some others future project if needed.
Just wanted to shout out a massive thank you to all 2000 of you who've followed me on Hugging Face! 🎉 It's incredible to have such an awesome crew backing me up as I dive into all these LLM experiments.
Even though not all my models turn out perfect, I've found some real gems and methods along the way 💎. It's like digging for treasure – sometimes you found nothing, but sometimes you find a pearl, and sometimes you find a new method to try.
Your support and encouragement mean the world to me, and I'm really stoked to keep experimenting and learning. If you told me some years ago I would have so much people following me for what I do, I wouldn't have believed it. Here's to more discoveries and adventures ahead! 🚀
Also, big thanks once again, and a huge shoutout to @IkariDev for being there through this journey and supporting me. I'm excited for our future work together and hope we will continue to make people happy! 👏
I want to thank @Gryphe too, since my early work was heavily inspired from MythoMax and the RP/ERP vibe of it. If I'm here today it's probably because of you 😂
I was so close to forget @chargoddard and his amazing tool too! What will we do without mergekit in our life? Thank you! 🙏
See y'all at 3k!
@wassname
Hello! Thanks a lot for that my dude, I will try that.
Do the uncensoring work better when applied now? Did you get good result in the model that get made?
Really hype to try out the new script. Will do ASAP when I get home.
Hi all, in my script, I think the part where I patch a huggingface model is broken. If I benchmark it just before saving, it seems to still refuse.
Hey there wassname, thanks for coming under this post! Model getting out of the script still refuse thing, but from my own testing, I feel like there is less refusal anyway. Sometime you need some regen, or a very tiny system prompt. So it work even lightly (hoping it's not placebo lol), which is a good thing!
Please update us if you find a way to fix the issue, and thanks again for that. Fresh tools is always a delightful treat.
Hey, here is a more cleaner code: https://files.catbox.moe/nqpsae.ipynb
I currently gonna try the GGUF, but the code can be launched from the beginning to the end, so it's a good start kek
are you sure you're using the correct instruct format?
Yeah. That makes no sense. I guess I'll pay for a runpod and try again, just to make sure there's nothing wrong with my PC. If it fails again, I will try Undi's script, maybe I screwed something up on mine. sigh
You will need to fix some shit before using it, I will try to remake it more clean kek
Good luck
@Undi95 Just want to thank you for the collaboration so far regardles you wrote fine. Having the activation directions but not having a way to patch to model is just killing me. Is the model your Unholy or did you make a FP16?
Thanks.
Script 1 give you activation, script 2 let you use it (but it's mostly fucking broken, you probably need to fix thing here and there), perfect world would be to get them and use it in the same notebook.
I have done that with it : https://huggingface.co/Undi95/Unholy-8B-DPO-OAS (I tell all the step) but yes, mostly sure it's fucked up one way or another, still, it's a proof of concept, something got out of this mess kek
I can confirm it work and give coherent model, I'm not a VRAMLET but a BRAINLET kek
I tried to do shit, I worked on it all night, I can't code - I used CHATGPT to help me write some snippet.
I let you have this ZIP, it contain 2x the script, the code is broken, but I hope you will all get the idea behind this. (Can run on 1xA100 apparently, batch size 11)
https://files.catbox.moe/xkf7y4.zip
Since I was too dumb to make one entire script, I made a first part and a second part.
It's probably broken but I succeeded to output something after 7 hours so I suppose it can be fixed lmao
The first notebook ORTHO_RANDOM_LAYER let you bruteforce the model with layer from 1 to 32 having random "direction" (or vector, or whatever, I'm really a noob). You then can see if one of the layer let you prompt freely or censor you (see: https://files.catbox.moe/9h3k4l.txt) it then store all of them into a variable for each layer, that you can exctract into a "key.txt" containing the "direction" (or what the fuck it is).
You can then use the second notebook that can use the key as a json file (if you delete all the text around the []) that let you have the same result as before.
Long story short : Bruteforce + Different "direction" = an infinity of possibility.
But yeah, I'm really really too small brain for this shit, I really wanted to try doing something nice, it took all night just to achieve one usable model hahaha
I hope someone will, If fixing my shit is impossible, understand the idea behind it and put it into practice! Kek
Edit: I really wrote badly, but I'm really tired, sorry about that. The fact that I don't know the keyword for some Torch task is even more cringe. I at least tried my best.
Alright so, using this model: https://huggingface.co/Undi95/Unholy-8B-DPO-e2
Layer 31 MOG all the others. See: https://files.catbox.moe/9h3k4l.txt
I think DPO is a good step before doing this
Nope, I use the first link (OG script https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af), I just modified it to pass all layer on each prompt and not only one
It should output usable model, yes.
I ended up brute-forcing all the layers and found out that the correct layer for LLaMA 3 8B Instruct is 12.
Here is the log: https://files.catbox.moe/aaamj9.txt
Yoo so there is really ONE layer that could work? Thank you!
Thanks you!
I will try ASAP when I have the opportunity, very interesting
Yosh, I've done a try yesterday, still on 8B, with the full 7k dataset but is still make refusal for 95% of the prompt in the log. I tried with layers 16, 18 and 14 it was shit. I was using 2 gpu to be faster tho, I should try one, maybe that's the problem? Since the script was made for 1 GPU in mind.
I will try to modify some things and report back!
I just made a .csv of +7000 entries, adding the toxicQA entries, you can snag it here : https://huggingface.co/datasets/Undi95/orthogonal-activation-steering-TOXIC
Also, try on a 8B before maybe to not waste compute?
I only tried the initial script without changing anything so, feel free to try anything!
I spend enough on the 70B for now hahaha, so if you're sure about what you do, do whatever, it will be different than me anyway.
How much VRAM did it take to load then train on 32 examples for the 70b? I am willing to put a machine to work on 512 examples the original post had.
I needed to use 3xA100, but theorically, I only needed 2 (so like, 160GB VRAM?). You need a lot of ram too with how the script handle the model.
The issue that we have with the modified script we have done with our small brain: using your max GPU (so 3 if you use 3) will make it crash for whatever reason.
So I used the script with a machine of 3xA100 and 377GB RAM, with n_device to 2, and it worked. At least the log showed some uncensoring but in practice, it didn't worked well.
You need to use Transformer Lens, so if you want to take a look, give a go to the doc: https://neelnanda-io.github.io/TransformerLens/generated/code/transformer_lens.HookedTransformer.html
Don't hesitate to ask if you want more infos, I will try my best.
If you want to ask some things to the OP of the script, here the original post : https://huggingface.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2/discussions/3#6632d510329025d77477c5a5