Quantization made by Richard Erkhov.

Psyfighter2-13B-vore - GGUF

Model creator: https://huggingface.co/SnakyMcSnekFace/
Original model: https://huggingface.co/SnakyMcSnekFace/Psyfighter2-13B-vore/

Name	Quant method	Size
Psyfighter2-13B-vore.Q2_K.gguf	Q2_K	4.52GB
Psyfighter2-13B-vore.IQ3_XS.gguf	IQ3_XS	4.99GB
Psyfighter2-13B-vore.IQ3_S.gguf	IQ3_S	5.27GB
Psyfighter2-13B-vore.Q3_K_S.gguf	Q3_K_S	5.27GB
Psyfighter2-13B-vore.IQ3_M.gguf	IQ3_M	5.57GB
Psyfighter2-13B-vore.Q3_K.gguf	Q3_K	5.9GB
Psyfighter2-13B-vore.Q3_K_M.gguf	Q3_K_M	5.9GB
Psyfighter2-13B-vore.Q3_K_L.gguf	Q3_K_L	6.45GB
Psyfighter2-13B-vore.IQ4_XS.gguf	IQ4_XS	6.54GB
Psyfighter2-13B-vore.Q4_0.gguf	Q4_0	6.86GB
Psyfighter2-13B-vore.IQ4_NL.gguf	IQ4_NL	6.9GB
Psyfighter2-13B-vore.Q4_K_S.gguf	Q4_K_S	6.91GB
Psyfighter2-13B-vore.Q4_K.gguf	Q4_K	7.33GB
Psyfighter2-13B-vore.Q4_K_M.gguf	Q4_K_M	7.33GB
Psyfighter2-13B-vore.Q4_1.gguf	Q4_1	7.61GB
Psyfighter2-13B-vore.Q5_0.gguf	Q5_0	8.36GB
Psyfighter2-13B-vore.Q5_K_S.gguf	Q5_K_S	8.36GB
Psyfighter2-13B-vore.Q5_K.gguf	Q5_K	8.6GB
Psyfighter2-13B-vore.Q5_K_M.gguf	Q5_K_M	8.6GB
Psyfighter2-13B-vore.Q5_1.gguf	Q5_1	9.1GB
Psyfighter2-13B-vore.Q6_K.gguf	Q6_K	9.95GB
Psyfighter2-13B-vore.Q8_0.gguf	Q8_0	12.88GB

Original model description:

license: llama2 language:

en pipeline_tag: text-generation inference: false tags:
pytorch
storywriting
finetuned
not-for-all-audiences base_model: KoboldAI/LLaMA2-13B-Psyfighter2 model_type: llama prompt_template: >

Instruction:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

Input:

{prompt}

Response:

Model Card for Psyfighter2-13B-vore

This model is a version of KoboldAI/LLaMA2-13B-Psyfighter2 finetuned to better understand vore context. The primary purpose of this model is to be a storywriting assistant, a conversational model in a chat, and an interactive choose-your-own-adventure text game.

The preliminary support for Adventure Mode has been added, but it is still work in progress.

This is the FP16-precision version of the model for merging and fine-tuning. For using the model, please see the quantized version and the instructions here: SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF

Model Details

The model behaves similarly to KoboldAI/LLaMA2-13B-Psyfighter2, which it was derived from. Please see the README.md here to learn more.

Updates

09/02/2024 - fine-tuned the model to follow Kobold AI Adventure Mode format
06/02/2024 - fixed errors in training and merging, significantly improving the overall prose quality
05/25/2024 - updated training process, making the model more coherent and improving the writing quality
04/13/2024 - uploaded the first version of the model

Bias, Risks, and Limitations

By design, this model has a strong vorny bias. It's not intended for use by anyone below 18 years old.

Training Details

The model was fine-tuned using a rank-stabilized QLoRA adapter. Training was performed using Unsloth AI library on Ubuntu 22.04.4 LTS with CUDA 12.1 and Pytorch 2.3.0.

The total training time on NVIDIA GeForce RTX 4060 Ti is about 26 hours.

After training, the adapter weights were merged into the dequantized model as described in ChrisHayduk's GitHub gist.

The quantized version of the model was prepared using llama.cpp.

QLoRa adapter configuration

Rank: 64
Alpha: 16
Dropout rate: 0.1
Target weights: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
use_rslora=True

Targeting all projections for QLoRA adapter resulted in the smallest loss compared to other combinations, even compared to larger rank adapters.

Domain adaptation

The initial training phase consists of fine-tuning the adapter on ~55 MiB of free-form text that containing stories focused around the vore theme. The text is broken into paragraphs, which are aggregated into training samples of 4096 tokens or less, without crossing the document boundary. Each sample starts with BOS token (with its label set to -100), and ends in EOS token. The paragraph breaks are normalized to always consist of two line breaks.

Dataset pre-processing

The raw-text stories in dataset were edited as follows:

titles, foreword, tags, and anything not comprising the text of the story are removed
non-ascii characters and chapter separators are removed
stories mentioning underage personas in any context are deleted
names of private characters are randomized

Training parameters

Max. sequence length: 4096 tokens
Samples per epoch: 5085
Number of epochs: 2
Learning rate: 1e-4
Warmup: 64 steps
LR Schedule: cosine
Batch size: 1
Gradient accumulation steps: 1

The training takes ~24 hours on NVIDIA GeForce RTX 4060 Ti.

Plots

Adventure mode SFT

The model is further trained on a private dataset of the adventure transcripts in Kobold AI adventure format, i.e:

As you venture deeper into the damp cave, you come across a lone goblin. The vile creature mumbles something to itself as it stares at the glowing text on a cave wall. It doesn't notice your approach.

> You sneak behind the goblin and hit it with the sword.

The dataset is generated by running adventure playthoughts with the model, and editing its output as necessary to create a cohesive evocative narrative. There are total of 657 player turns in the dataset.

The model is trained on completions only; the loss for the user input tokens is ignored by setting their label to -100. The prompt is truncated on the left with the maximum length of 2048 tokens.

Training parameters

Max. sequence length: 4096 tokens
Samples per epoch: 657
Number of epochs: 2
Learning rate: 1e-5
Warmup: 32 steps
LR Schedule: cosine
Batch size: 1
Gradient accumulation steps: 1

The training takes ~150 minutes on NVIDIA GeForce RTX 4060 Ti.

Results

The fine-tuned model is able to understand the Kobold AI Adventure Format. It no longer attempts to generate the player's inputs starting with ">", and instead emits the EOS token, allowing the player to take turn.

Without the context, the model tends to produce very short responses, 1-2 paragraphs at most. The non-player characters are passive and the model does not advance the narrative. This behavior is easily corrected by setting up the context in the instruct format:

### Instruction: 

Text transcript of a never-ending adventure story, written by the AI assistant. AI assistant uses vivid and evocative language to create a well-written novel. Characters are proactive and take initiative. Think about what goals the characters of the story have and write what they do to achieve those goals. 

### Input: 

<< transcript of the adventure + player's next turn >>

Write a few paragraphs that advance the plot of the story. 

### Response:

(See instructions in SnakyMcSnekFace/Psyfighter2-13B-vore-GGUF for formatting the context in koboldcpp.)

Setting or removing the instructions allows the model to generate accepted/rejected synthetic data samples for KTO. This data can then be used to further steer the model towards better storytelling in the Adventure Mode without the need for the specially-crafted context.

Plots

Adventure mode KTO

TBD