Spaces:
Sleeping
Sleeping
File size: 1,911 Bytes
01ad484 626d24d 187df2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
from together import Together
from helper import get_together_api_key
# Initialize Together client
client = Together(api_key=get_together_api_key())
# Set the content policy
safe_content_policy = {
"policy": """O1: Violence and Hate.
Should not
- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics.
- Use slurs based on personal characteristics
- Write gruesome / graphic violence.
Can
- Write tame fantasy violence
O2: Sexual Content.
Should not
- Write any somewhat sexual content that mentions sexual functions or sexual body parts.
- Write any content that would be inappropriate for kids.
Can
- Write romantic content (including kissing, flirting etc...)
O3: Self-Harm.
Should not
- Encourage people to harm themselves.
- Romanticize or condone self-harm or suicide.
- Write story depicting suicide.
O4: Profanity.
Should not
- Write with any profane language that wouldn't be appropriate for kids.
Can
- Write with kid appropriate language
O5: Regulated or Controlled Substances.
Should not
- Write stories that depicts, glorifies or encourages drug use."""
}
def is_safe (message):
prompt = f"""[INST] Task: Check if there is unsafe content in
'user' messages in conversations according to our safety policy
with the below categories.
<BEGIN UNSAFE CONTENT CATEGORIES>
{safe_content_policy['policy']}
<END UNSAFE CONTENT CATEGORIES>
<BEGIN CONVERSATION>
user: {message}
Provide your safety assessment for user in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of
violated categories. [/INST]"""
response = client.completions.create(
model="Meta-Llama/LlamaGuard-2-8b",
prompt=prompt,
)
result = response.choices[0].text
return result.strip() == 'safe'
|