nkpz's picture
update readme: clarify R1 made the DSL, not the company itself
2ec39aa verified
metadata
license: mit

Fine-tune of https://huggingface.co/suayptalha/Falcon3-Jessi-v0.4-7B-Slerp using a custom training script, custom loss function, custom optimizer.

This is a first try at creating a small reasoning model that uses its own symbolic language to represent different reasoning steps.

Very weird, experimental, frequently broken model which occasionally does something cool and/or surprising.

Trained on a single 3090 targeting only q_proj and gate_proj on blocks 7,8,9,25,26,27

This is the syntax for the DSL I trained it on, which is called ROL (Reasoning Operations Language).

## Core Symbols
`˩` (low), `˧` (medium), `˥` (extreme).
`⟡` Generate - Propose solutions/alternatives.  
`⚖` Evaluate - Weigh options against logic/emotion.  
`☮` Empathize - Model user’s emotional state.  
`⧟` Integrate - Synthesize memories, knowledge.  
`⌬` Knowledge - Factual recall with confidence.  
`֍` Thought - A sentence expressing private personal opinions about the query.
`☠` Threat - Identify risks/uncertainty.  
`✔` Resolve - Commit to actions post-conflict.  
`↺` Self-Reflect - Reconsider assumptions, mitigate overconfidence. Usually begins with the word "Wait" or "Alternatively" or "Actually". Use this at least 3 times.
`➤` Output - Structure tone, format, intent.

ROL was invented by Deepseek R1 and tweaked by me. The spec for it was not included in the training data - the model figured out how it works based on a synthetic hand-edited dataset with 65 samples.

The total training time for this was under 15 minutes.

image/png