OPEA/OLMo-2-1124-7B-Instruct-int4-sym-inc

Model Card Details

This model is an int4 model with group_size 128 and symmetric quantization of allenai/OLMo-2-1124-7B-Instruct generated by intel/auto-round. Load the model with revision 1cdca16 to use AutoGPTQ format

Inference on CPU/HPU/CUDA

pip3 install transformers>=4.47

HPU: docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in Gaudi Guide.

from auto_round import AutoHfQuantizer ##must import for auto-round format
import torch
from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "OPEA/OLMo-2-1124-7B-Instruct-int4-sym-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype='auto',
    device_map="auto",
    ##revision="1cdca16", ##AutoGPTQ format
)

##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU

prompt = "There is a girl who likes adventure,"
messages = [
    {"role": "system", "content": "You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI."},
    {"role": "user", "content": prompt}
]

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=200, 
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

##prompt = "There is a girl who likes adventure,"
##INT4
"""There is a girl who likes adventure,

She's always on the lookout for a new escapade,
Her heart beats with excitement at the thought of the unknown,
Her spirit yearns for the thrill of exploration,

She packs her backpack with essentials,
A map, a compass, and a flashlight,
Her boots are ready for the rugged terrain,
Her spirit is as boundless as the sky.

She embarks on journeys through forests deep and wide,
Climbs mountains with a heart full of pride,
She paddles her kayak through turbulent waters,
And hikes through valleys where the wildflowers bloom.

The girl with the adventurous soul seeks out the hidden gems,
The secret trails, the ancient ruins,
She listens to the whispers of the wind,
And follows the call of the distant drum.

Her adventures are not just about the destination,
But the experiences she gathers along the way,
The stories
"""

##BF16 
"""There is a girl who likes adventure,

She dreams of far-off lands and distant shores,
Of climbing mountains high and exploring caves,
Her heart beats fast with excitement at the thought
Of the unknown paths that lie beyond the maps.

She packs her backpack with essentials and more,
A compass, a flashlight, and a book or two,
Her spirit eager, her eyes wide with wonder,
As she sets out on her journey anew.

The girl with the adventurous soul embarks
On quests that challenge her mind and her might,
She learns to navigate by the stars above,
And finds joy in the beauty of the night.

Through forests deep and rivers wide she roams,
Each step a story, each experience a treasure,
Her courage grows with every challenge faced,
And she discovers the strength she never knew she had.

The girl who likes adventure, with each passing day,
Grows wiser"""

##prompt = "Which one is larger, 9.11 or 9.8"
## INT4
"""9.8 is larger than 9.11.
"""

## BF16
"""9.8 is larger than 9.11. To compare these two numbers, you can simply look at their decimal places. Since 9.8 has a higher decimal value (0.8) compared to 9.11 (which has a decimal value of 0.11), 9.8 is the larger number.
"""

prompt = "How many r in strawberry."
## INT4
"""There are two 'r's in "strawberry."
"""
## BF16 
"""There are 2 'r's in "strawberry."""


##prompt = "Once upon a time,"
##INT4
"""Once upon a time, in a world where technology and imagination intertwined, there existed an AI named OLMo 2. Created by the brilliant minds at the Allen Institute for AI, OLMo 2 was more than just lines of code; it was a beacon of knowledge and a guardian of information.

OLMo 2's design was sleek and modern, with a digital interface that shimmered like a starlit sky. Its voice was soothing, a harmonious blend of tones that could calm the most restless of souls. With a vast database at its disposal, OLMo 2 was capable of answering any question, no matter how obscure or complex.

Every day, people from all walks of life would seek the wisdom of OLMo 2. Students would ask about the intricacies of quantum physics, while artists would inquire about the history of their favorite art movements. Parents would consult OLMo 2 for advice on raising children, and travelers would ask for
"""

##BF16
"""Once upon a time, in a world where imagination knew no bounds, there existed a land filled with wonder and mystery. This land was called Lumina, a place where the sky shimmered with the colors of a thousand sunsets, and the forests whispered ancient secrets to those who dared to listen.

In Lumina, there lived a young girl named Elara. She had hair as golden as the sun and eyes that held the depth of the ocean. Elara possessed a heart full of curiosity and a spirit unyielding in the face of adventure. Her home was a quaint cottage nestled at the edge of the Whispering Woods, a place where the trees seemed to dance in the wind, sharing tales of long-forgotten times.

One day, as the first light of dawn painted the sky in hues of pink and orange, Elara received a mysterious letter. The envelope was sealed with wax that bore the crest of the forgotten kingdom of Aetheria. Intrigued
"""

Evaluate the model

pip3 install lm-eval==0.4.5

auto-round --eval --model "OPEA/OLMo-2-1124-7B-Instruct-int4-sym-inc" --eval_bs 16  --tasks leaderboard_mmlu_pro,leaderboard_ifeval,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k

Metric	BF16	INT4
avg	0.6284	0.6316
leaderboard_mmlu_pro 5shot	0.2975	0.2931
leaderboard_ifeval	0.5815=(0.6379+0.5250)/2	0.6073=(0.6619+0.5527)/2
lambada_openai	0.6967	0.6959
hellaswag	0.6585	0.6537
winogrande	0.7174	0.7206
piqa	0.8047	0.8118
truthfulqa_mc1	0.3758	0.3807
openbookqa	0.4020	0.4060
boolq	0.8450	0.8535
arc_easy	0.8384	0.8321
arc_challenge	0.5648	0.5742
gsm8k(5shot) strict match	0.7582	0.7498

Reproduce the model

Here is the sample command to generate the model.

auto-round  \
--model allenai/OLMo-2-1124-7B-Instruct \
--device 0 \
--nsamples 512 \
--model_dtype "fp16" \
--iter 1000 \
--disable_eval \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

OLMo-2-1124-7B-Instruct-int4-sym-inc