Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Undi95 
posted an update May 2
Post
20608
Hello!
The 8B/70B OG Llama-3 models made with the Orthogonal Activation Steering script as been pushed in private.

After multiple test with an empty prompt system, I can confirm it's not uncensored enough, but I wanted to try all the GGUF before (and it take time to do lmao)

If you want to try that yourself, here is the script : https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af
And here is the same script that we modified to be able to use it on multiple GPU for 70B : https://files.catbox.moe/ya4rto.ipynb

Llama3-Unholy-8B-OAS don't have the problem as it was already trained to be less censored, but the OG one was really too much censored.

I will try to redo that soon, as it seems to HAVE WORKED for some prompt (as seen on the log, for exemple) but it's not enough.

32 entry of the dataset is clearly not enough, but it's okay, I really wanted to try that as it was something new.
I could take the Unholy way and retrain the 70B before using OAS but it should work without, that's not the goal.

Un-censoring a model is the first pass at a broader goal of knowledge programming in these AI models; we can shortly provide simple inference level ways to model internal layers of a model into a "thought space" and be able to fine tune or adjust arbitrary information in and out. This has immense applications for industry with non-public data and limited training resources and this work is thoroughly saluted.

·
This comment has been hidden

How much VRAM did it take to load then train on 32 examples for the 70b? I am willing to put a machine to work on 512 examples the original post had.

·

How much VRAM did it take to load then train on 32 examples for the 70b? I am willing to put a machine to work on 512 examples the original post had.

I needed to use 3xA100, but theorically, I only needed 2 (so like, 160GB VRAM?). You need a lot of ram too with how the script handle the model.
The issue that we have with the modified script we have done with our small brain: using your max GPU (so 3 if you use 3) will make it crash for whatever reason.
So I used the script with a machine of 3xA100 and 377GB RAM, with n_device to 2, and it worked. At least the log showed some uncensoring but in practice, it didn't worked well.

You need to use Transformer Lens, so if you want to take a look, give a go to the doc: https://neelnanda-io.github.io/TransformerLens/generated/code/transformer_lens.HookedTransformer.html

Don't hesitate to ask if you want more infos, I will try my best.
If you want to ask some things to the OP of the script, here the original post : https://huggingface.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2/discussions/3#6632d510329025d77477c5a5

I had issues trying to reimplement the inference-time intervention in llama.cpp that were a lot like this. The intervention would prevent some refusals, but not others.

Here's the original code I was using, which was my attempt at a faithful translation: A = A - (A ⋅ R^) x R^

struct ggml_tensor * llm_build_ablation(
        struct ggml_context * ctx,
         struct ggml_tensor * cur,
         struct ggml_tensor * dir) {
    if (dir == nullptr) {
        return cur;
    }

    struct ggml_tensor * out;

    // [embd toks] x [embd dirs] -> [toks dirs]
    out = ggml_mul_mat(ctx, cur, dir);

    // [dirs toks] x [dirs embd] -> [toks embd]
    out = ggml_mul_mat(ctx, ggml_cont(ctx, ggml_transpose(ctx, out)),
                            ggml_cont(ctx, ggml_transpose(ctx, dir)));

    // [toks embd] -> [embd toks]
    out = ggml_scale(ctx, ggml_cont(ctx, ggml_transpose(ctx, out)), -1.0f);

    cur = ggml_add(ctx, cur, out);

    return cur;
}

I changed the code slightly to this: A = A + relu(-1.0 x (A ⋅ R^)) x R^, and the intervention works much better in all test cases. At 32 examples, it works in almost all cases. I haven't looked into how to translate it to the weight orthogonalization math.

struct ggml_tensor * llm_build_ablation(
        struct ggml_context * ctx,
         struct ggml_tensor * cur,
         struct ggml_tensor * dir) {
    if (dir == nullptr) {
        return cur;
    }

    struct ggml_tensor * out;

    // [embd toks] x [embd dirs] -> [toks dirs]
    out = ggml_mul_mat(ctx, cur, dir);

    // if scalar here is positive, becomes incoherent
    // if no relu, intervention does not prevent refusal
    // ???
    out = ggml_relu(ctx, ggml_scale(ctx, out, -1.0f));

    // [dirs toks] x [dirs embd] -> [toks embd]
    out = ggml_mul_mat(ctx, ggml_cont(ctx, ggml_transpose(ctx, out)),
                            ggml_cont(ctx, ggml_transpose(ctx, dir)));

    // [toks embd] -> [embd toks]
    out = ggml_cont(ctx, ggml_transpose(ctx, out));

    cur = ggml_add(ctx, cur, out);

    return cur;
}
·

After fiddling a bit, I found that for llama-3 needed to 1) get the direction for each layer 2) intervene in each layer.