(Not to be confused with Pygmalion 13B and Pygmalion 2 13B.)

Pygmalion 1.3B GGML

This repository contains quantized conversions of the Pygmalion 1.3B checkpoint.

For use with frontends that support GGML quantized GPT-NeoX models, such as KoboldCpp and Oobabooga (with the CTransformers loader).

Last updated on 2023-09-23.

Model Startup RAM usage (KoboldCpp) Startup RAM usage (Oobabooga)
pygmalion-1.3b.q4_0.bin 1.0 GiB 1.3 GiB
pygmalion-1.3b.q4_1.bin 1.1 GiB 1.4 GiB
pygmalion-1.3b.q5_0.bin 1.2 GiB 1.5 GiB
pygmalion-1.3b.q5_1.bin 1.3 GiB 1.6 GiB
pygmalion-1.3b.q8_0.bin 1.7 GiB 2.0 GiB
pygmalion-1.3b.f16.bin 2.9 GiB 3.2 GiB

Recommended settings:

Pygmalion 1.3B is a limited model, left in the dust by the Pygmalion project's advancements since then. Which is a shame, as it remains one of the few conversational models available for systems with less than 2GB RAM, at least before we get TinyLLaMA and quantized Phi-1.5.

Here are some tips to get the best results you can out of this model:

  • Stick to a low temperature, preferably between 0.2 and 0.7.
  • Keep your repetition penalty between 1.0 and 1.02. These tiny values are required for models based on Pythia Deduped.
  • If using SillyTavern, follow these settings: image/png
  • You also have to keep character descriptions to a few sentences, possibly following CharacterAI's 500-character descriptions.

Notes:

  • KoboldCpp [bfc696f] was tested without OpenBLAS.
  • Oobabooga [895ec9d] was tested with with the --model <model> --loader ctransformers --model_type gptneox launch arguments.
  • ggerganov/ggml [8ca2c19] was used for conversion and quantization.
  • The original model is available at PygmalionAI/pygmalion-1.3b.
  • Earlier ggmlv2 quantizations are available here.

Below is the original model card for Pygmalion 1.3B.


Pygmalion 1.3B

Model description

Pymalion 1.3B is a proof-of-concept dialogue model based on EleutherAI's pythia-1.3b-deduped.

Warning: This model is NOT suitable for use by minors. It will output X-rated content under certain circumstances.

Training data

The fine-tuning dataset consisted of 56MB of dialogue data gathered from multiple sources, which includes both real and partially machine-generated conversations.

Training procedure

Fine-tuning was done using ColossalAI (specifically, with a slightly modified version of their OPT fine-tune example) for around 11.4 million tokens over 5440 steps on a single 24GB GPU. The run took just under 21 hours.

Intended use

The easy way

We provide a notebook with a Gradio UI for playing around with the model without having to manually format inputs. This notebook can be found here.

The manual way

The model can be used as a regular text generation model, but it'll perform best if the input prompt adheres to the following format:

[CHARACTER]'s Persona: [A few sentences about the character you want the model to play]

[DIALOGUE HISTORY]
You: [Your input message here]
[CHARACTER]:

Where [CHARACTER] is, as you can probably guess, the name of the character you want the model to portray, and [DIALOGUE HISTORY] is chat history so the model can have some conversational context to draw from. Ideally it'll be pairs of messages like:

[CHARACTER]: [some dialogue here]
You: [your response to the dialogue above]

Apart from chat history, you can also just add example conversations in [DIALOGUE HISTORY] to show how the character should speak - ideally at the beginning, so it doesn't get confused as to what's conversation history vs. character definition.

Known issues

  • The model can get stuck repeating certain phrases, or sometimes even entire sentences.
    • We believe this is due to that behavior being present in the training data itself, and plan to investigate and adjust accordingly for future versions.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.