A bagel, with everything

bagel

Just a fiction oriented 4bpw exl2 quantization of https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2

Quantized on 300K tokens of two Vicuna format chats, a sci fi story and a fiction story at a long context. This should yield better storywriting performance than the default exl2 quantization.

If anyone wants sizes other than 4bpw, for more/less context or smaller GPUs, just ask.


Running

Being a Yi model, try running a lower temperature with ~0.05 MinP, a little repitition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default.

24GB GPUs can run Yi-34B-200K models at 45K-75K context with exllamav2, and performant UIs like exui. I go into more detail in this post


Commands

First pass:

python convert.py --in_dir /home/alpha/FastModels/jondurbin_bagel-dpo-34b-v0.2 -o /home/alpha/FastModels/scratch -om /home/alpha/FastModels/bagelmeas.json --cal_dataset /home/alpha/Documents/stories.parquet -ml 32768 -mr 7 -ss 4096 -b 4.0 -hb 6 -nr

Second pass:

python convert.py --in_dir /home/alpha/FastModels/jondurbin_bagel-dpo-34b-v0.2 -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/bagelmeas.json --cal_dataset /home/alpha/Documents/stories.parquet -l 12288 -r 25 -ml 32768 -mr 9 -ss 4096 -b 4.0 -hb 6 -cf /home/alpha/FastModels/jondurbin_bagel-dpo-34b-v0.2-exl2-4bpw-fiction -nr
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train brucethemoose/jondurbin_bagel-dpo-34b-v0.2-exl2-4bpw-fiction