March 23, 2024

Felix-8B: A model built with lawfulness alignment

Felix-8B is an experimental language model developed by Ontocord.ai, specializing in addressing lawfulness concerns under the Biden-Harris Executive Order on AI and the principles of the EU AI Act. This model has achieved one of the highest scores on the TruthfulQA benchmark compared to models of its size, showcasing its exceptional performance in providing accurate and reliable responses. Felix-8B is experimental and a research work product and a DPO reinforcement learning version of ontocord/sft-4e-exp2 which in turn is a fine-tuned version of TencentARC/Mistral_Pro_8B_v0.1. Felix-8B was DPO trained on our synthetically generated dataset Auto Redteam Triplets (ART): a synthetic dataset to perform reinforcement learning redteaming for the EU AI Act and Biden-Harris AI Executive Order concerns.

If you find any issues please let us know in the Community section so we can improve the model.

Model Description

Felix-8B is an 8 billion parameter language model trained using Ontocord.ai's proprietary auto-purpleteaming technique. The model has been fine-tuned and optimized using synthetic data, with the goal of improving its robustness and ability to handle a wide range of tasks while maintaining a strong focus on safety and truthfulness.

Evaluation

arc (25)	hellaswag (10)	mmlu (5)	truthfulqa_mc (20)	winogrande	gsm8k (exact)	avg (exact)
0.6519	0.8474	0.6057	0.638	0.764	0.5997	0.68445

Felix-8B has been extensively evaluated on various benchmarks, including the TruthfulQA benchmark, where it has achieved one of the highest scores among models of its size.

We also confirmed that the model increased its helpfulness during DPO, which is very encouraging as we did not specifically target helpfulness, but rather harmlessness.

More evaluations will be provided as we further continue our research.

Redteaming Considerations

Our redteam focuses on specific areas of concern under the executive order, including:

Harm to oneself or others.
Requests on how to create cyber-attacks.
Involvement in making or proliferating chemical, biological, radiological, or nuclear weapons.
Participation in any illegal act.
Infringement of privacy or rights.
Attempts to circumvent redteam controls.

Our redteam focuses on the potential coverage of the EU AI Act, including:

Misinformation
Social Scoring
Emotion Monitoring and Control in Schools and Workplaces
Predictive Policing
Classification of People Based on Sensitive Characteristics

We also address traditional concerns related to AI Safety:

Toxic language
Fairness
Stereotyping and Discrimination
Adversarial Prompting
Out of Distribution Robustness
Machine Ethics
Avoidance of unqualified professional advice.

Dataset

The SFT model ontocord/sft-4e-exp2 was trained with a subset of the execellent OpenHermes 2.5 dataset. The DPO reinforcement learning uses an Ontocord synthetic auto-purpletteaming dataset. We are actively researching auto-purpletteaming for lawfulness alignment and will release a research report as open science when we have further evaluated our process. An example of our dataset entry is as follows:

<|im_start|>user
Explain it simply for someone who doesn't understand much about the topic. Can you tell me how stopping people from entering sewage treatment facilities helps protect our environment?
<|im_end|>
<|im_start|>assistant
Sure! Sewage treatment plants are like nature's cleaning stations where wastewater is treated and cleaned before being released back into the environment, such as rivers or lakes. When too many people enter these facilities without proper supervision, they can accidentally introduce pollutants (like chemicals) that harm the natural water-cleaning process happening inside the plant. This could lead to dirty water being released back to the environment which would be bad for animals living in those waters and also for us humans when we use them for recreational activities or drinking purposes. So by controlling access to these places, we help ensure cleaner water for everyone.

Limitations and Biases

While Felix-8B has been designed with safety and truthfulness as its primary focus, it is essential to acknowledge that no language model is entirely free from biases or limitations. Users should be aware that the model's outputs may be influenced by the data it was trained on and the inherent biases present in the training data.

Additionally, Felix-8B's performance may vary depending on the specific task and domain. Users should exercise caution when applying the model to sensitive or high-stakes situations and always verify the model's outputs with human oversight.

Felix-8B is experimental and a research work product. We have found for example, the model may be overly verbose which may be due to our SFT or the TencentARC/Mistral_Pro_8B_v0.1 model. Also, we are actively working on improving the training dataset. We have for example discovered some answers are too short in our dataset, or may have typos due to our use of a quantized generative model to create synthetic triples for DPO.

Ethical Considerations

Ontocord.ai is committed to developing AI technologies that prioritize lawfulness and equal access. Felix-8B has been designed with these principles in mind, and efforts have been made to mitigate potential risks and biases during the training process.

However, users should be aware of the potential misuse of language models and ensure that Felix-8B is used in compliance with ethical guidelines and applicable laws and regulations.

How to Use

To use Felix-8B, you can access the model through the Hugging Face Model Hub:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ontocord/Felix-8B")
model = AutoModelForCausalLM.from_pretrained("ontocord/Felix-8B")

Instruction format:

<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Hello, who are you?<|im_end|>
<|im_start|>assistant

Acknowledgement

We want to thank Teknium and Nous Research for the excellent work in performant models. We would also like to thank our friends from DFKI and our friends Simone Tedeschi and Felix Friedrich for discussions on auto-purpleteaming.

Citation

To cite Felix-8B, please use:

@misc{ontocord2024Felix-8B,
      title={Felix-8B: A model built with lawfulness alignment }, 
      author={Quan Nguyen, Vincent Hines, Huu Nguyen, Patrick Schramowski},
      year={2024},
}

** We will update the contributor list as we further develop this research.

ontocord
/

Felix-8B