|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
# TinyBERT_L-4_H-312_v2 ONNX Model |
|
|
|
This repository provides an ONNX version of the `TinyBERT_L-4_H-312_v2` model, originally developed by the team at [Huawei Noah's Ark Lab](https://arxiv.org/abs/1909.10351) |
|
and ported to Transformers by [Nils Reimers](https://huggingface.co/nreimers). |
|
The model is a compact version of BERT, designed for efficient inference and reduced memory footprint. The ONNX version includes mean pooling of the last hidden layer for convenient feature extraction. |
|
|
|
## Model Overview |
|
|
|
TinyBERT is a smaller version of BERT that maintains competitive performance while significantly reducing the number of parameters and computational cost. This makes it ideal for deployment in resource-constrained environments. The model is based on the work presented in the paper ["TinyBERT: Distilling BERT for Natural Language Understanding"](https://arxiv.org/abs/1909.10351). |
|
|
|
## License |
|
|
|
This model is distributed under the Apache 2.0 License. For more details, please refer to the [license file](https://github.com/huawei-noah/Pretrained-Language-Model/blob/master/TinyBERT/LICENSE) in the original repository. |
|
|
|
## Model Details |
|
|
|
- **Model:** TinyBERT_L-4_H-312_v2 |
|
- **Layers:** 4 |
|
- **Hidden Size:** 312 |
|
- **Pooling:** Mean pooling of the last hidden layer |
|
- **Format:** ONNX |
|
|
|
## Usage |
|
|
|
To use this model, you will need to have `onnxruntime` installed. You can install it via pip: |
|
|
|
```bash |
|
pip install onnxruntime, transformers |
|
``` |
|
|
|
Below is a Python code snippet demonstrating how to run inference using this ONNX model: |
|
|
|
```python |
|
import onnxruntime as ort |
|
from transformers import AutoTokenizer |
|
|
|
model_path="TinyBERT_L-4_H-312_v2-onnx/" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
ort_sess = ort.InferenceSession(model_path + "/tinybert_mean_embeddings.onnx") |
|
|
|
features = tokenizer(['How many people live in Berlin?','Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="np") |
|
onnx_inputs = {k: v for k, v in features.items() if k != 'token_type_ids'} |
|
ort_outs = ort_sess.run(None, onnx_inputs) |
|
print(ort_outs) |
|
|
|
print("Mean pooled output:", mean_pooled_output) |
|
``` |
|
|
|
Make sure to replace `'model_path'` with the actual path to your ONNX model file. |
|
|
|
## Training Details |
|
|
|
For detailed information on the training process of TinyBERT, please refer to the [original paper](https://arxiv.org/abs/1909.10351) by Huawei Noah's Ark Lab. |
|
|
|
## Acknowledgements |
|
|
|
This model is based on the work by the team at Huawei Noah's Ark Lab and by Nils Reimers. Special thanks to the developers for providing the pre-trained model and making it accessible to the community. |