File size: 1,866 Bytes
ef807be ae87dd4 ef807be ae87dd4 232ff91 ae87dd4 232ff91 ae87dd4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
base_model: anakin87/gemma-2b-orpo
tags:
- orpo
datasets:
- alvarobartt/dpo-mix-7k-simplified
language:
- en
---
<img src="https://huggingface.co/anakin87/gemma-2b-orpo/resolve/main/assets/gemma-2b-orpo.png" width="450"></img>
# gemma-2b-orpo-GGUF
This is a GGUF quantized version of the [`gemma-2b-orpo` model](https://huggingface.co/anakin87/gemma-2b-orpo/):
an ORPO fine-tune of google/gemma-2b.
You can find more information, including evaluation and training/usage notebook in the [`gemma-2b-orpo` model card](https://huggingface.co/anakin87/gemma-2b-orpo/)
## 🎮 Model in action
The model can run with all the libraries that are part of the Llama.cpp ecosystem.
If you need to apply the prompt template manually, take a look at the [tokenizer_config.json of the original model](https://huggingface.co/anakin87/gemma-2b-orpo/blob/main/tokenizer_config.json).
📱 **Run the model on a budget smartphone** -> [see my recent post](https://www.linkedin.com/posts/stefano-fiorucci_llm-genai-edgecomputing-activity-7183365537618411520-PU2s)
Here a simple example with **Llama.cpp python**:
```python
! pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="anakin87/gemma-2b-orpo-GGUF",
filename="gemma-2b-orpo.Q5_K_M.gguf",
verbose=True # for a known bug, verbose must be True
)
# text generation - prompt template applied manually
llm("<bos><|im_start|> user\nName the planets in the solar system<|im_end|>\n<|im_start|>assistant\n", max_tokens=75)
# chat completion - prompt template automatically applied
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "Please list some places to visit in Italy"
}
]
)
``` |