|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- role-play |
|
- fine-tuned |
|
- qwen2 |
|
- llama-cpp |
|
- gguf-my-repo |
|
base_model: oxyapi/oxy-1-micro |
|
library_name: transformers |
|
--- |
|
|
|
# Triangle104/oxy-1-micro-Q5_K_M-GGUF |
|
This model was converted to GGUF format from [`oxyapi/oxy-1-micro`](https://huggingface.co/oxyapi/oxy-1-micro) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/oxyapi/oxy-1-micro) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
Oxy 1 Micro is a fine-tuned version of the Qwen2-1.5B language model, specialized for role-play |
|
scenarios. Despite its small size, it delivers impressive performance |
|
in generating engaging dialogues and interactive storytelling. |
|
|
|
|
|
Developed by Oxygen (oxyapi), with contributions from TornadoSoftwares, Oxy 1 Micro aims to provide an accessible and efficient language model for creative and immersive role-play experiences. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Model Details |
|
|
|
|
|
|
|
|
|
Model Name: Oxy 1 Micro |
|
Model ID: oxyapi/oxy-1-micro |
|
Base Model: Qwen/Qwen2-1.5B |
|
Model Type: Chat Completions |
|
License: Apache-2.0 |
|
Language: English |
|
Tokenizer: Qwen/Qwen2.5-1.5B-Instruct |
|
Max Input Tokens: 32,768 |
|
Max Output Tokens: 8,192 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Features |
|
|
|
|
|
|
|
|
|
Fine-tuned for Role-Play: Specially trained to generate dynamic and contextually rich role-play dialogues. |
|
Efficient: Compact model size allows for faster inference and reduced computational resources. |
|
Parameter Support: |
|
temperature |
|
top_p |
|
top_k |
|
frequency_penalty |
|
presence_penalty |
|
max_tokens |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Metadata |
|
|
|
|
|
|
|
|
|
Owned by: Oxygen (oxyapi) |
|
Contributors: TornadoSoftwares |
|
Description: A Qwen2-1.5B fine-tune for role-play; small model but still good. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Usage |
|
|
|
|
|
|
|
|
|
To utilize Oxy 1 Micro for text generation in role-play scenarios, |
|
you can load the model using the Hugging Face Transformers library: |
|
|
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("oxyapi/oxy-1-micro") |
|
model = AutoModelForCausalLM.from_pretrained("oxyapi/oxy-1-micro") |
|
|
|
prompt = "You are a wise old wizard in a mystical land. A traveler approaches you seeking advice." |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=500) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Performance |
|
|
|
|
|
|
|
|
|
Performance benchmarks for Oxy 1 Micro are not available at this |
|
time. Future updates may include detailed evaluations on relevant |
|
datasets. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
License |
|
|
|
|
|
|
|
|
|
This model is licensed under the Apache 2.0 License. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Citation |
|
|
|
|
|
|
|
|
|
If you find Oxy 1 Micro useful in your research or applications, please cite it as: |
|
|
|
|
|
@misc{oxy1micro2024, |
|
title={Oxy 1 Micro: A Fine-Tuned Qwen2-1.5B Model for Role-Play}, |
|
author={Oxygen (oxyapi)}, |
|
year={2024}, |
|
howpublished={\url{https://huggingface.co/oxyapi/oxy-1-micro}}, |
|
} |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/oxy-1-micro-Q5_K_M-GGUF --hf-file oxy-1-micro-q5_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/oxy-1-micro-Q5_K_M-GGUF --hf-file oxy-1-micro-q5_k_m.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/oxy-1-micro-Q5_K_M-GGUF --hf-file oxy-1-micro-q5_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/oxy-1-micro-Q5_K_M-GGUF --hf-file oxy-1-micro-q5_k_m.gguf -c 2048 |
|
``` |
|
|