roberta-base-go_emotions-onnx / README.md

Update README.md

90ee0c1 9 months ago

No virus

6.11 kB

	---
	language: en
	tags:
	- text-classification
	- onnx
	- int8
	- roberta
	- emotions
	- multi-class-classification
	- multi-label-classification
	- optimum
	datasets:
	- go_emotions
	license: mit
	inference: false
	widget:
	- text: Thank goodness ONNX is available, it is lots faster!
	---

	This model is the ONNX version of [https://huggingface.co/SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions).

	### Full precision ONNX version

	`onnx/model.onnx` is the full precision ONNX version

	- that has identical accuracy/metrics to the original Transformers model
	- and has the same model size (499MB)
	- is faster in inference than normal Transformers, particularly for smaller batch sizes
	- in my tests about 2x to 3x as fast for a batch size of 1 on a 8 core 11th gen i7 CPU using ONNXRuntime

	#### Metrics

	Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label:

	- Accuracy: 0.474
	- Precision: 0.575
	- Recall: 0.396
	- F1: 0.450

	See more details in the SamLowe/roberta-base-go_emotions model card for the increases possible through selecting label-specific thresholds to maximise F1 scores, or another metric.

	### Quantized (INT8) ONNX version

	`onnx/model_quantized.onnx` is the int8 quantized version

	- that is one quarter the size (125MB) of the full precision model (above)
	- but delivers almost all of the accuracy
	- is faster in inference than both the full precision ONNX above, and the normal Transformers model
	- about 2x as fast for a batch size of 1 on an 8 core 11th gen i7 CPU using ONNXRuntime vs the full precision model above
	- which makes it circa 5x as fast as the full precision normal Transformers model (on the above mentioned CPU, for a batch of 1)

	#### Metrics for Quantized (INT8) Model

	Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label:

	- Accuracy: 0.475
	- Precision: 0.582
	- Recall: 0.398
	- F1: 0.447

	Note how the metrics are almost identical to the full precision metrics above.

	See more details in the SamLowe/roberta-base-go_emotions model card for the increases possible through selecting label-specific thresholds to maximise F1 scores, or another metric.

	### How to use

	#### Using Optimum Library ONNX Classes

	Optimum library has equivalents (starting `ORT`) for the main Transformers classes, so these models can be used with the familiar constructs. The only extra property needed is `file_name` on the model creation, which in the below example specifies the quantized (INT8) model.

	```python
	sentences = ["ONNX is seriously fast for small batches. Impressive"]

	from transformers import AutoTokenizer, pipeline
	from optimum.onnxruntime import ORTModelForSequenceClassification

	model_id = "SamLowe/roberta-base-go_emotions-onnx"
	file_name = "onnx/model_quantized.onnx"

	model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name)
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	onnx_classifier = pipeline(
	task="text-classification",
	model=model,
	tokenizer=tokenizer,
	top_k=None,
	function_to_apply="sigmoid", # optional as is the default for the task
	)

	model_outputs = onnx_classifier(sentences)
	# gives a list of outputs, each a list of dicts (one per label)

	print(model_outputs)
	# E.g.
	# [[{'label': 'admiration', 'score': 0.9203393459320068},
	# {'label': 'approval', 'score': 0.0560273639857769},
	# {'label': 'neutral', 'score': 0.04265536740422249},
	# {'label': 'gratitude', 'score': 0.015126707963645458},
	# ...
	```

	#### Using ONNXRuntime

	- Tokenization can be done before with the `tokenizers` library,
	- and then the fed into ONNXRuntime as the type of dict it uses,
	- and then simply the postprocessing sigmoid is needed afterward on the model output (which comes as a numpy array) to create the embeddings.

	```python
	from tokenizers import Tokenizer
	import onnxruntime as ort

	from os import cpu_count
	import numpy as np # only used for the postprocessing sigmoid

	sentences = ["hello world"] # for example a batch of 1

	# labels as (ordered) list - from the go_emotions dataset
	labels = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']

	tokenizer = Tokenizer.from_pretrained("SamLowe/roberta-base-go_emotions")

	# Optional - set pad to only pad to longest in batch, not a fixed length.
	# (without this, the model will run slower, esp for shorter input strings)
	params = {**tokenizer.padding, "length": None}
	tokenizer.enable_padding(**params)

	tokens_obj = tokenizer.encode_batch(sentences)

	def load_onnx_model(model_filepath):
	_options = ort.SessionOptions()
	_options.inter_op_num_threads, _options.intra_op_num_threads = cpu_count(), cpu_count()
	_providers = ["CPUExecutionProvider"] # could use ort.get_available_providers()
	return ort.InferenceSession(path_or_bytes=model_filepath, sess_options=_options, providers=_providers)

	model = load_onnx_model("path_to_model_dot_onnx_or_model_quantized_dot_onnx")
	output_names = [model.get_outputs()[0].name] # E.g. ["logits"]

	input_feed_dict = {
	"input_ids": [t.ids for t in tokens_obj],
	"attention_mask": [t.attention_mask for t in tokens_obj]
	}

	logits = model.run(output_names=output_names, input_feed=input_feed_dict)[0]
	# produces a numpy array, one row per input item, one col per label

	def sigmoid(x):
	return 1.0 / (1.0 + np.exp(-x))

	# Post-processing. Gets the scores per label in range.
	# Auto done by Transformers' pipeline, but we must do it manually with ORT.
	model_outputs = sigmoid(logits)

	# for example, just to show the top result per input item
	for probas in model_outputs:
	top_result_index = np.argmax(probas)
	print(labels[top_result_index], "with score:", probas[top_result_index])
	```

	### Example notebook: showing usage, accuracy & performance

	Notebook with more details to follow.