Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
2
Qun Gao
qgao007
Follow
daniel-de-leon's profile picture
1 follower
ยท
1 following
qgao007
AI & ML interests
None yet
Recent Activity
Reacted to
daniel-de-leon
's
post
with ๐ฅ
about 1 month ago
As the rapid adoption of chat bots and QandA models continues, so do the concerns for their reliability and safety. In response to this, many state-of-the-art models are being tuned to act as Safety Guardrails to protect against malicious usage and avoid undesired, harmful output. I published a Hugging Face blog introducing a simple, proof-of-concept, RoBERTa-based LLM that my team and I finetuned to detect toxic prompt inputs into chat-style LLMs. The article explores some of the tradeoffs of fine-tuning larger decoder vs. smaller encoder models and asks the question if "simpler is better" in the arena of toxic prompt detection. ๐ to blog: https://huggingface.co/blog/daniel-de-leon/toxic-prompt-roberta ๐ to model: https://huggingface.co/Intel/toxic-prompt-roberta ๐ to OPEA microservice: https://github.com/opea-project/GenAIComps/tree/main/comps/guardrails/toxicity_detection A huge thank you to my colleagues that helped contribute: @qgao007, @mitalipo, @ashahba and Fahim Mohammad
upvoted
an
article
about 1 month ago
Occamโs Sheath: A Simpler Approach to AI Safety Guardrails
View all activity
Organizations
Posts
1
view post
Post
Reply
Hello world!
Papers
2
arxiv:
2309.14592
arxiv:
2306.16601
spaces
1
Runtime error
๐ป
Test
models
None public yet
datasets
None public yet