Alignment Tuning Evaluation?
#2
by
Akirami
- opened
Are there any evaluations performed before and after alignment tuning? Like checking the toxicity levels, and illegal responses. I gave the model some simple prompts to generate illegal content and the model did not hesitate at all. Even though it provided almost non-sensical answers, the fact that it did not hesitate to answer such questions is something to be noted