QuantLM 560M 3 bit

QuantLM, unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, QuantLM has the same architecture as LLaMa.

import transformers as tf, torch
model_name = "SpectraSuite/QuantLM_560M_3bit_Unpacked"
# Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs.
pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto")
# These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly.
pipeline("Once upon a time")
Downloads last month
8
Safetensors
Model size
569M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including SpectraSuite/QuantLM_560M_3bit_Unpacked