|
--- |
|
license: apache-2.0 |
|
--- |
|
# Quantized BERT-base MNLI model with 90% of usntructured sparsity |
|
The pruned and quantized model in the OpenVINO IR. The pruned model was taken from this [source](https://huggingface.co/neuralmagic/oBERT-12-downstream-pruned-unstructured-90-mnli) and quantized with the code below using HF Optimum for OpenVINO: |
|
|
|
```python |
|
from functools import partial |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
from optimum.intel.openvino import OVConfig, OVQuantizer |
|
|
|
model_id = "neuralmagic/oBERT-12-downstream-pruned-unstructured-90-mnli" #"typeform/distilbert-base-uncased-mnli" |
|
model = AutoModelForSequenceClassification.from_pretrained(model_id) |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
save_dir = "./nm_mnli_90" |
|
|
|
def preprocess_function(examples, tokenizer): |
|
return tokenizer(examples["premise"], examples["hypothesis"], padding="max_length", max_length=128, truncation=True) |
|
|
|
# Load the default quantization configuration detailing the quantization we wish to apply |
|
quantization_config = OVConfig() |
|
# Instantiate our OVQuantizer using the desired configuration |
|
quantizer = OVQuantizer.from_pretrained(model) |
|
# Create the calibration dataset used to perform static quantization |
|
|
|
calibration_dataset = quantizer.get_calibration_dataset( |
|
"glue", |
|
dataset_config_name="mnli", |
|
preprocess_function=partial(preprocess_function, tokenizer=tokenizer), |
|
num_samples=100, |
|
dataset_split="train", |
|
) |
|
# Apply static quantization and export the resulting quantized model to OpenVINO IR format |
|
quantizer.quantize( |
|
quantization_config=quantization_config, |
|
calibration_dataset=calibration_dataset, |
|
save_directory=save_dir, |
|
) |
|
# Save the tokenizer |
|
tokenizer.save_pretrained(save_dir) |
|
``` |