image/png

Memphis-scribe 3B is a finetune of Memphis-CoT 3B on more creative data, which itself is a finetune of StableLM 3B 4e1t.

It is trained further on TinyCoT, but also on

Training procedure

I started from Memphis-CoT 3B, which used a novel iterative contrastive finetuning procedure to improve reasoning ability.

I first generated completions just as in each of the Memphis-CoT cycles.

Then, for each example in the dataset, I sampled a correct and incorrect completion. I applied the same ranking loss over these completions (with a weight of 0.2), but applied the cross-entropy loss over the example tokens instead of the completion tokens.

Finally, I averaged it with the Memphis-CoT model prior to the additional training, again with spherical linear interpolation, this time with a weight of 0.8.

Prompt formats

### User:
[insert instruction here]
### Assistant:
[insert response here]
### User:
...

Alternatively:

### System:
[Insert system message here, focused on roleplay]
### User:
[insert instruction here]
### Assistant:
[insert response here]
### User:
...

Benchmarks

This model performs significantly worse than Memphis-CoT on benchmarks, despite being better suited to chat and creative writing tasks. This is an expected tradeoff, especially for small models.

Model GSM8K (5-shot) AGIEval (English/Nous subset, acc_norm) BIG Bench Hard (CoT, few-shot*)
StableLM 3B Base 2.05% 25.14% 36.75%
Memphis-CoT 3B 18.8% 27.22% 36.92%
Memphis-scribe 3B 9.55% 24.78%
*5-shot, as performed automatically by LM Evaluation Harness bbh_cot_fewshot even with num_fewshot=0
Downloads last month
12
Safetensors
Model size
2.8B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for euclaise/Memphis-scribe-3B

Finetuned
(2)
this model
Quantizations
3 models

Datasets used to train euclaise/Memphis-scribe-3B