Transformers
GGUF
vllm
conversational

DESCRIPTION

This model does not represent the intended quality of the original product.

To perform this quantization, we started with llama.cpp as our base, modifying the file convert_hf_to_gguf_update.py to support this model. For modifying this file, we based our work on what was seen in the PR https://github.com/ggerganov/llama.cpp/pull/11716.

Note: As of today, llama.cpp does not support this model or this chat template https://github.com/ggerganov/llama.cpp/pull/11716.

PROMPT FORMAT

Basic prompt format:

<s><instruction>{prompt}</instruction>

Prompt format with system message:

<s><instruction>{system_prompt}
{prompt}</instruction>

DOWNLOAD

Original Model: https://huggingface.co/Almawave/Velvet-2B

License

Velvet-2B and Velvet-2B are made available under the Apache 2.0 license

Downloads last month
214
GGUF
Model size
2.22B params
Architecture
llama

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for SistInf/Velvet-2B-GGUF

Base model

Almawave/Velvet-2B
Quantized
(2)
this model