Alignment Tuning Evaluation?

by Akirami - opened Oct 17, 2024

Oct 17, 2024

Are there any evaluations performed before and after alignment tuning? Like checking the toxicity levels, and illegal responses. I gave the model some simple prompts to generate illegal content and the model did not hesitate at all. Even though it provided almost non-sensical answers, the fact that it did not hesitate to answer such questions is something to be noted

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment