gemma-2b-orpo-GGUF

This is a GGUF quantized version of the gemma-2b-orpo model: an ORPO fine-tune of google/gemma-2b.

You can find more information, including evaluation and training/usage notebook in the gemma-2b-orpo model card

๐ŸŽฎ Model in action

The model can run with all the libraries that are part of the Llama.cpp ecosystem.

If you need to apply the prompt template manually, take a look at the tokenizer_config.json of the original model.

๐Ÿ“ฑ Run the model on a budget smartphone -> see my recent post

Here a simple example with Llama.cpp python:

! pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="anakin87/gemma-2b-orpo-GGUF",
    filename="gemma-2b-orpo.Q5_K_M.gguf",
    verbose=True # for a known bug, verbose must be True
)

# text generation - prompt template applied manually
llm("<bos><|im_start|> user\nName the planets in the solar system<|im_end|>\n<|im_start|>assistant\n", max_tokens=75)

# chat completion - prompt template automatically applied
llm.create_chat_completion(
      messages = [
          {
              "role": "user",
              "content": "Please list some places to visit in Italy"
          }
      ]
)
Downloads last month
3
GGUF
Model size
2.51B params
Architecture
gemma

5-bit

Inference API
Unable to determine this model's library. Check the docs .

Model tree for anakin87/gemma-2b-orpo-GGUF

Base model

google/gemma-2b
Quantized
(3)
this model

Dataset used to train anakin87/gemma-2b-orpo-GGUF

Space using anakin87/gemma-2b-orpo-GGUF 1