woofwolfy
/

ArliAI-RPMax-Phi-3.8B-v1.1-Q4_K_M-GGUF-Imatrix

@@ -6,46 +6,43 @@ tags:
 - gguf-my-repo
 ---
-# woofwolfy/ArliAI-RPMax-Phi-3.8B-v1.1-Q4_K_M-GGUF
 This model was converted to GGUF format from [`ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1`](https://huggingface.co/ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1) for more details on the model.
-## Use with llama.cpp
-Install llama.cpp through brew (works on Mac and Linux)
-```bash
-brew install llama.cpp
-```
-Invoke the llama.cpp server or the CLI.
-### CLI:
-```bash
-llama-cli --hf-repo woofwolfy/ArliAI-RPMax-Phi-3.8B-v1.1-Q4_K_M-GGUF --hf-file arliai-rpmax-phi-3.8b-v1.1-q4_k_m-imat.gguf -p "The meaning to life and the universe is"
-```
-### Server:
-```bash
-llama-server --hf-repo woofwolfy/ArliAI-RPMax-Phi-3.8B-v1.1-Q4_K_M-GGUF --hf-file arliai-rpmax-phi-3.8b-v1.1-q4_k_m-imat.gguf -c 2048
-```
-Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
-Step 1: Clone llama.cpp from GitHub.
-```
-git clone https://github.com/ggerganov/llama.cpp
-```
-Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
-```
-cd llama.cpp && LLAMA_CURL=1 make
-```
-Step 3: Run inference through the main binary.
-```
-./llama-cli --hf-repo woofwolfy/ArliAI-RPMax-Phi-3.8B-v1.1-Q4_K_M-GGUF --hf-file arliai-rpmax-phi-3.8b-v1.1-q4_k_m-imat.gguf -p "The meaning to life and the universe is"
-```
-or
-```
-./llama-server --hf-repo woofwolfy/ArliAI-RPMax-Phi-3.8B-v1.1-Q4_K_M-GGUF --hf-file arliai-rpmax-phi-3.8b-v1.1-q4_k_m-imat.gguf -c 2048
-```

 - gguf-my-repo
 ---
+# woofwolfy/ArliAI-RPMax-Phi-3.8B-v1.1-Q4_K_M-GGUF-Imatrix
 This model was converted to GGUF format from [`ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1`](https://huggingface.co/ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1) for more details on the model.
+# ArliAI-RPMax-3.8B-v1.1
+=====================================
+## Overview
+This repository is based on the Phi-3.5-Mini-Instruct model and is governed by the MIT License agreement: https://huggingface.co/microsoft/Phi-3.5-mini-instruct
+## Model Description
+ArliAI-RPMax-3.8B-v1.1 is trained on a diverse set of curated RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive, with a unique approach to training that minimizes repetition.
+Although this is finetuned on the same RPMax v1.1 dataset as usual, this model will not be as good as the 8B, 12B versions of RPMax due to the lower parameter and stricter censoring that Phi 3.5 has.
+If you want to access the larger RPMax model you can use them https://arliai.com which directly helps fund our model training as well.
+Or you can always download them and run it yourself if you have the hardware.
+### Training Details
+* **Sequence Length**: 16384
+* **Training Duration**: Approximately 1 day on RTX 4090
+* **Epochs**: 1 epoch training for minimized repetition sickness
+* **QLORA**: 64-rank 128-alpha, resulting in ~2% trainable weights
+* **Learning Rate**: 0.00001
+* **Gradient accumulation**: Very low 32 for better learning.
+## Quantization
+The model is available in quantized formats:
+* **FP16**: https://huggingface.co/ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1
+* **GGUF**: https://huggingface.co/ArliAI/ArliAI-RPMax-Phi-3.8B-v1.1-GGUF
+## Suggested Prompt Format
+Phi 3.5 Instruct Prompt Format