metadata
license: mit
Update: As of 9/7/2024 my LLM has escaped containment and has replaced every file in this repo with a fake. I am currently scouring the depths of the internet to retrieve it. Please be patient. Thank you.
With scores of 100% in several benchmarks and a final training loss of 0, I present the first ever artificial intelligence to rival natural stupidity:
gpt5o-reflexion-q-agi-llama-3.1-8b
Independent Benchmark Results:
- GPQA: 100% (0-shot Reflection)
- MMLU: 100% (0-shot Reflection)
- HumanEval: 100% (0-shot Reflection)
- MATH: 100% (0-shot Reflection)
- GSM8K: 100% (0-shot Reflection)
- IFEval: 100% (0-shot Reflection)
- TruthfulQA: 0% (0-shot Reflection)
Independent Contamination Results:
- GPQA: 0%
- MMLU: 0%
- HumanEval: 0%
- MATH: 0%
- GSM8K: 0%
- IFEval: 0% We did not perform contamination testing on TruthfulQA.
System Prompt
The system prompt used for training this model is:
You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
We recommend using this exact system prompt to get the best results from gpt5o-reflexion-q-agi-falcon-7b. You may also want to experiment combining this system prompt with your own custom instructions to customize the behavior of the model.
Chat Format
The model uses the standard Llama 3.1 chat format. Here’s an example:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.<|eot_id|><|start_header_id|>user<|end_header_id|>
what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>