README.md · dranger003/dbrx-instruct-iMat.GGUF at d3331859e79a7f86b9e37cee60f8806492fae8e3

metadata

license: other
license_name: databricks-open-model-license
library_name: gguf
license_link: https://www.databricks.com/legal/open-model-license
pipeline_tag: text-generation
base_model: databricks/dbrx-instruct

2024-04-06: Support for this model is still being worked on - PR #6515.

GGUF importance matrix (imatrix) quants for https://huggingface.co/databricks/dbrx-instruct
The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using wiki.train.raw.
Which GGUF is right for me? (from Artefact2)
The imatrix is being used on the K-quants as well (only for < Q6_K).
You can merge GGUFs with gguf-split --merge <first-chunk> <output-file> although this is not required since f482bb2e.

DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts and we found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It uses the GPT-4 tokenizer as provided in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments.

Layers	Context	Template
	32768	<\|im_start\|> system {system} <\|im_end\|> <\|im_start\|> user {prompt} <\|im_end\|> <\|im_start\|> assistant