xs_blenderbot_onnx (only 168 mb)
onnx quantized version of facebook/blenderbot_small-90M model (350 mb)
Faster cpu inference
INTRO
Before usage:
• download blender_model.py script from files in this repo
• pip install onnxruntime
you can use the model with huggingface generate function with its all parameters
Usage
With text generation pipeline
>>>from blender_model import TextGenerationPipeline
>>>max_answer_length = 100
>>>response_generator_pipe = TextGenerationPipeline(max_length=max_answer_length)
>>>utterance = "Hello, how are you?"
>>>response_generator_pipe(utterance)
i am well. how are you? what do you like to do in your free time?
Or you can call the model
>>>from blender_model import OnnxBlender
>>>from transformers import BlenderbotSmallTokenizer
>>>original_repo_id = "facebook/blenderbot_small-90M"
>>>repo_id = "remzicam/xs_blenderbot_onnx"
>>>model_file_names = [
"blenderbot_small-90M-encoder-quantized.onnx",
"blenderbot_small-90M-decoder-quantized.onnx",
"blenderbot_small-90M-init-decoder-quantized.onnx",
]
>>>model=OnnxBlender(original_repo_id, repo_id, model_file_names)
>>>utterance = "Hello, how are you?"
>>>inputs = tokenizer(utterance,
return_tensors="pt")
>>>outputs= model.generate(**inputs,
max_length=max_answer_length)
>>>response = tokenizer.decode(outputs[0],
skip_special_tokens = True)
>>>print(response)
i am well. how are you? what do you like to do in your free time?
Credits
To create the model, I adopted codes from https://github.com/siddharth-sharma7/fast-Bart repository.