How to detect words?
#17
by
GoominDev
- opened
When the input contains profanity or injection, the score is displayed,
but is there a way to find out which words caused the score result?
@GoominDev We don't have token-level classification yet, though I think a binary search over the input (e.g. chunking it into halves, scanning each half recursively) to find a section that might be triggering the model would work pretty well.
GoominDev
changed discussion status to
closed