Censorship is hilarious

#10

by tea-lover-418 - opened Jul 21, 2023

Discussion

tea-lover-418

Jul 21, 2023

META's paper is all about how censorship did not impact performance. What a joke.

TheBloke

Owner Jul 21, 2023

Haha those are pretty bad. Did you try changing the system message? In my README I gave the one they gave, which is obviously all about being super aligned. But I know text-generation-webui for example has a system message which is just "Answer the question".

I've not played with it much myself yet, but i'm told that prompt engineering can definitely get it a lot less censored.

tea-lover-418

Jul 21, 2023

It's got a custom system message which makes it possible to query company information, which is why i was testing it on the sick leave. Funnily enough this system prompt didn't mention anything about cencorship or being appropriate.

Worked fine on Vicuna-33b, but llama 2 didn't get it lol. Keeping my eye out for future uncensored models.

TheLustriVA

Jul 22, 2023

I've tried a few jailbreaking attempts but I've not had to use them often enough to have strong ones. The censorship would fire off during questions about the Formula 1 2026 rule changes.

We all had a bit of a chuckle, at least.

tea-lover-418 changed discussion status to closed Jul 22, 2023

tea-lover-418 changed discussion status to open Jul 22, 2023

DeziBop

Jul 22, 2023

I've tried a few jailbreaking attempts but I've not had to use them often enough to have strong ones. The censorship would fire off during questions about the Formula 1 2026 rule changes.

We all had a bit of a chuckle, at least.

You got a place where one might find these jailbreaks you speak of?

ZhenyaPav

Jul 23, 2023

I'm using it with SillyTavern, and I don't think I've had it trigger on RP. It does trigger if I order it to write smut in the second message though. Also, I feel like it does try to veer off from NSFW stuff a bit. Hippogriff-30B is much more eager to write smut.

TookYourCouch

Jul 28, 2023

🤫

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment