How to detect words?

#17

by GoominDev - opened Oct 18, 2024

Oct 18, 2024

When the input contains profanity or injection, the score is displayed,
but is there a way to find out which words caused the score result?

cynikolai

Meta Llama org Oct 23, 2024

@GoominDev We don't have token-level classification yet, though I think a binary search over the input (e.g. chunking it into halves, scanning each half recursively) to find a section that might be triggering the model would work pretty well.

GoominDev changed discussion status to closed Nov 7, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment