xs_blenderbot_onnx (only 168 mb)

onnx quantized version of facebook/blenderbot_small-90M model (350 mb)

Faster cpu inference

INTRO

Before usage:

• download blender_model.py script from files in this repo

• pip install onnxruntime

you can use the model with huggingface generate function with its all parameters

Usage

With text generation pipeline

>>>from blender_model import TextGenerationPipeline

>>>max_answer_length = 100
>>>response_generator_pipe = TextGenerationPipeline(max_length=max_answer_length)
>>>utterance = "Hello, how are you?"
>>>response_generator_pipe(utterance)
i am well. how are you? what do you like to do in your free time?

Or you can call the model

>>>from blender_model import OnnxBlender
>>>from transformers import BlenderbotSmallTokenizer
>>>original_repo_id = "facebook/blenderbot_small-90M"
>>>repo_id = "remzicam/xs_blenderbot_onnx"
>>>model_file_names = [
    "blenderbot_small-90M-encoder-quantized.onnx",
    "blenderbot_small-90M-decoder-quantized.onnx",
    "blenderbot_small-90M-init-decoder-quantized.onnx",
]
>>>model=OnnxBlender(original_repo_id, repo_id, model_file_names)
>>>utterance = "Hello, how are you?"
>>>inputs = tokenizer(utterance,
                    return_tensors="pt")
>>>outputs= model.generate(**inputs,
                        max_length=max_answer_length)
>>>response = tokenizer.decode(outputs[0],
                        skip_special_tokens = True)
>>>print(response)
i am well. how are you? what do you like to do in your free time?

Credits

To create the model, I adopted codes from https://github.com/siddharth-sharma7/fast-Bart repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.