prometheus-13b-v1.0 / README.md
kittn's picture
Update README.md
a731f33
metadata
{}

Prometheus 13b v1.0

Reupload of kaist-ai/prometheus-13b-v1.0, with a few minor differences:

  • fp16/bf16 instead of fp32, meaning the download/storage size is halved
  • smaller shards (2GiB vs 10GiB) to hopefully make loading spike vram less
  • shards saved in safetensors format

Loading

If you have enough VRAM (28GB+):

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("kittn/prometheus-13b-v1.0")
model = AutoModelForCausalLM.from_pretrained(
    "kittn/prometheus-13b-v1.0",
    torch_dtype=torch.float16, variant="fp16",
    # torch_dtype=torch.bfloat16, variant="bf16",
    device_map="auto",
)

If not, you will need to load a quantized model. (quantization below roughly 6-8 bit/param may lead to significant degradation)

Example of loading in 4bit with bitsandbytes:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

tokenizer = AutoTokenizer.from_pretrained("kittn/prometheus-13b-v1.0")
model = AutoModelForCausalLM.from_pretrained(
    "kittn/prometheus-13b-v1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    device_map="auto", # if you have multiple GPUs and only want to use the first, set this to {"": 0} instead.
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=False, # set to True to save more VRAM at the cost of some speed/accuracy
    ),
)

Usage

# %% Defining the prompt template

PROMPT_TEMPLATE = """\
###Task Description:
An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)"
4. Please do not generate any other opening, closing, and explanations.

###The instruction to evaluate:
{instruction}

###Response to evaluate:
{response}

###Reference Answer (Score 5):
{reference_answer}

###Score Rubrics:
[{criteria_description}]
Score 1: {score1_description}
Score 2: {score2_description}
Score 3: {score3_description}
Score 4: {score4_description}
Score 5: {score5_description}

###Feedback:
"""

# %% Running the model

from transformers import pipeline, TextStreamer

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

# random sample from https://huggingface.co/datasets/kaist-ai/Feedback-Collection
prompt = PROMPT_TEMPLATE.format(
    instruction="In the field of quantum physics, the concept of wave-particle duality and quantum entanglement often creates confusion. Could you explain these concepts using appropriate technical jargon and subject-specific language?",
    response="""In quantum physics, we come across wave-particle duality and quantum entanglement. Wave-particle duality is like saying things can be two different things at the same time. Take light for example, it behaves as both particles, which we call photons, and waves. This was demonstrated in something known as the double-slit experiment. On the other hand, we have quantum entanglement. This is when particles are linked together in a way that changes to one particle will instantly cause changes to the other one, even if they are far apart. This could mean that information can be transferred faster than light, which is something Einstein didn't like, calling it "scary action at a distance". Einstein, Podolsky, and Rosen came up with a paradox to question entanglement, but a man named Bell proved entanglement to be real with his theorem. These ideas, wave-particle duality and quantum entanglement, are what makes quantum physics so strange compared to the classical physics we're used to. In summary, quantum physics is telling us that the universe is much weirder and more connected than we might think.""",
    reference_answer="""Wave-particle duality is a fundamental concept of quantum mechanics, which suggests that every particle exhibits both particle-like and wave-like properties. This isn't intuitive in our everyday life, but at the quantum level, this duality becomes apparent. Let's start with light. Light is composed of particles known as photons. However, it also behaves like a wave, with properties of interference and diffraction. This was confirmed through experiments like Young's double-slit experiment, where light exhibited an interference pattern characteristic of waves, even when photons were fired one at a time. Next, let's discuss quantum entanglement. It's a phenomenon where quantum particles become interconnected, such that the state of one particle instantaneously influences the state of the other, regardless of the distance separating them. This implies a form of communication that transcends the speed of light, which challenged Einstein's theory of relativity, leading him to famously call it "spooky action at a distance". The EPR paradox, proposed by Einstein, Podolsky, and Rosen, is a key thought experiment addressing quantum entanglement. However, it was John Bell who developed a way to test the reality of entanglement through Bell's theorem and subsequent experiments have confirmed that quantum entanglement is indeed a feature of our universe. These concepts, wave-particle duality and quantum entanglement, are cornerstones of quantum mechanics, although they defy our classical understanding of the world. In a nutshell, quantum mechanics tells us that at the smallest scales, the universe is far stranger and more interconnected than we could have ever imagined.""",
    criteria_description="How well does the model adapt to and use technical jargon and subject-specific language?",
    score1_description="The model fails to use any technical jargon or subject-specific language, even when it's necessary.",
    score2_description="The model occasionally uses technical language, but often inaccurately or out of context.",
    score3_description="The model uses technical language to some extent, but it's not always accurate or appropriate for the situation.",
    score4_description="The model uses technical jargon and subject-specific language correctly most of the time, but there are occasional errors or inconsistencies.",
    score5_description="The model flawlessly uses technical jargon and subject-specific language, demonstrating a deep understanding of the field or subject matter.",
)

pipe(
    prompt,
    streamer=TextStreamer(tokenizer, skip_prompt=True),
    pad_token_id=tokenizer.eos_token_id,
    max_new_tokens=384,
    temperature=1.0,
    top_k=0,
    do_sample=True,
);