Edit model card

BEE-spoke-data/Jamba-900M-doc-writer

to test it out, try this notebook

This model produces long, surprisingly coherent output that extends some input text; you can see an example here, which is a generated textbook about underwater city design.

image/png

Thanks to the Jamba arch, it uses low VRAM while generating outputs: about 2.5 GB VRAM to generate 12,288 tokens.

Model description

This model is a fine-tuned version of pszemraj/jamba-900M-v0.13-KIx2 on some textbook data.

It achieves the following results on the evaluation set:

  • Loss: 3.0200
  • Accuracy: 0.4544
  • Num Input Tokens Seen: 4940890112

Intended Uses & Limitations

  • Long context generation
  • It requires a rather long prompt (aka 'Introduction') to be coaxed into consistently producing long, textbook-like text
  • this model itself is small, so its reasoning, knowledge, etc. is limited, but still impressive for the size (hidden size 1024)

Downloads last month
70
Safetensors
Model size
888M params
Tensor type
F32
·
BF16
·
Inference Examples
Inference API (serverless) has been turned off for this model.

Finetuned from