Alignment Tuning Evaluation?

#2
by Akirami - opened

Are there any evaluations performed before and after alignment tuning? Like checking the toxicity levels, and illegal responses. I gave the model some simple prompts to generate illegal content and the model did not hesitate at all. Even though it provided almost non-sensical answers, the fact that it did not hesitate to answer such questions is something to be noted

Sign up or log in to comment