dlite-v1-774m / README.md
jacobrenn's picture
Update README.md
f1ff472
|
raw
history blame
6.22 kB
metadata
license: apache-2.0
datasets:
  - tatsu-lab/alpaca
language:
  - en
library_name: transformers

Model Card for dlite-v1-774m

AI Squared's dlite-v1-774 (blog post) is a large language model which is derived from OpenAI's large GPT-2 model and fine-tuned on a single GPU on a corpus of 50k records (Stanford Alpaca) to help it exhibit chat-based capabilities.

While dlite-v1-774m is not a state-of-the-art model, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought.

Model Description

  • Developed by: AI Squared, Inc.
  • Shared by: AI Squared, Inc.
  • Model type: Large Language Model
  • Language(s) (NLP): EN
  • License: Apache v2.0
  • Finetuned from model: GPT-2

Bias, Risks, and Limitations

dlite-v1-774m is not a state-of-the-art language model. dlite-v1-774m is an experimental technology and is not designed for use in any environment other than for research purposes. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology.

Usage

The code below shows how to use dlite-v1-774m in the way which it was trained. While the model can be used "out of the box" using the transformers library, using the function defined below to create a response from the model will achieve better results.

Load Model and Tokenizer from this Repository Using the transformers Package

from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
import re

model_id = 'aisquared/dlite-v1-774m'

tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side = 'left')
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code = True, device_map = 'auto')

Create the Prompt Format and Other Variables

PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""

END_KEY = '### End'
RESPONSE_KEY = '### Response:\n'

Create a Function to Retrieve a Response

def create_response(
        instruction,
        model,
        tokenizer,
        do_sample = True,
        max_new_tokens = 256,
        top_p = 0.92,
        top_k = 0,
        **kwargs
):
    """
    Create a response from the model by using a formatted prompt
    """
    input_ids = tokenizer(
        PROMPT.format(instruction=instruction), return_tensors="pt"
    ).input_ids

    gen_tokens = model.generate(
        input_ids,
        pad_token_id=tokenizer.pad_token_id,
        do_sample=do_sample,
        max_new_tokens=max_new_tokens,
        top_p=top_p,
        top_k=top_k,
        **kwargs,
    )
    decoded = tokenizer.batch_decode(gen_tokens)[0]

    # The response appears after "### Response:".  The model has been trained to append "### End" at the end.
    m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", decoded, flags=re.DOTALL)

    response = None
    if m:
        response = m.group(1).strip()
    else:
        # The model might not generate the "### End" sequence before reaching the max tokens.  In this case, return
        # everything after "### Response:".
        m = re.search(r"#+\s*Response:\s*(.+)", decoded, flags=re.DOTALL)
        if m:
            response = m.group(1).strip()
        else:
            pass
    return response

Model Performance Metrics

We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family. Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.

model openbookqa arc_easy winogrande hellaswag arc_challenge piqa boolq
gpt2 0.164 0.438131 0.51618 0.289185 0.190273 0.628945 0.487156
dlite-v2-124m 0.174 0.44697 0.502762 0.291974 0.192833 0.631665 0.520183
dlite-v1-124m 0.17 0.462542 0.494081 0.293268 0.223549 0.622416 0.502446
gpt2-medium 0.186 0.490741 0.531176 0.333101 0.215017 0.676279 0.585933
dlite-v2-355m 0.206 0.493687 0.524073 0.334993 0.226109 0.670838 0.582263
dlite-v1-355m 0.216 0.507576 0.496448 0.338478 0.234642 0.664309 0.600306
gpt2-large 0.194 0.531566 0.553275 0.363971 0.216724 0.703482 0.604893
dlite-774m-v2 0.212 0.539562 0.5588 0.365565 0.234642 0.700218 0.60367
dlite-774m-v1 0.218 0.545875 0.562747 0.375124 0.250853 0.698041 0.614985
gpt2-xl 0.224 0.582912 0.583268 0.400418 0.25 0.708379 0.617737
dlite-v1-1.5b 0.226 0.588384 0.584846 0.401414 0.268771 0.708379 0.624159
dlite-v2-1.5b 0.226 0.59596 0.581689 0.40719 0.273891 0.705114 0.630887