ort-community
/

ModernBERT-base-ONNX-ORT

Model card Files Files and versions Community

ModernBERT-base-ONNX-ORT / README.md

ayousanz's picture

Add ONNX and ORT models with quantization

8b43bea verified 17 days ago

|

history blame contribute delete

3.05 kB

	---
	license: apache-2.0
	tags:
	- onnx
	- ort
	---

	# ONNX and ORT models with quantization of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)

	[日本語READMEはこちら](README_ja.md)

	This repository contains the ONNX and ORT formats of the model [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), along with quantized versions.

	## License
	The license for this model is "apache-2.0". For details, please refer to the original model page: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).

	## Usage
	To use this model, install ONNX Runtime and perform inference as shown below.
	```python
	# Example code
	import onnxruntime as ort
	import numpy as np
	from transformers import AutoTokenizer
	import os

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained('answerdotai/ModernBERT-base')

	# Prepare inputs
	text = 'Replace this text with your input.'
	inputs = tokenizer(text, return_tensors='np')

	# Specify the model paths
	# Test both the ONNX model and the ORT model
	model_paths = [
	'onnx_models/model_opt.onnx', # ONNX model
	'ort_models/model.ort' # ORT format model
	]

	# Run inference with each model
	for model_path in model_paths:
	print(f'\n===== Using model: {model_path} =====')
	# Get the model extension
	model_extension = os.path.splitext(model_path)[1]

	# Load the model
	if model_extension == '.ort':
	# Load the ORT format model
	session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
	else:
	# Load the ONNX model
	session = ort.InferenceSession(model_path)

	# Run inference
	outputs = session.run(None, dict(inputs))

	# Display the output shapes
	for idx, output in enumerate(outputs):
	print(f'Output {idx} shape: {output.shape}')

	# Display the results (add further processing if needed)
	print(outputs)
	```

	## Contents of the Model
	This repository includes the following models:

	### ONNX Models
	- `onnx_models/model.onnx`: Original ONNX model converted from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
	- `onnx_models/model_opt.onnx`: Optimized ONNX model
	- `onnx_models/model_fp16.onnx`: FP16 quantized model
	- `onnx_models/model_int8.onnx`: INT8 quantized model
	- `onnx_models/model_uint8.onnx`: UINT8 quantized model

	### ORT Models
	- `ort_models/model.ort`: ORT model using the optimized ONNX model
	- `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model
	- `ort_models/model_int8.ort`: ORT model using the INT8 quantized model
	- `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model

	## Notes
	Please adhere to the license and usage conditions of the original model [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).

	## Contribution
	If you find any issues or have improvements, please create an issue or submit a pull request.