TrOCR: Optimized for Mobile Deployment

Transformer based model for state-of-the-art optical character recognition (OCR) on both printed and handwritten text

End-to-end text recognition approach with pre-trained image transformer and text transformer models for both image understanding and wordpiece-level text generation.

This model is an implementation of TrOCR found here.

This repository provides scripts to run TrOCR on Qualcomm® devices. More details on model performance across various devices, can be found here.

Model Details

  • Model Type: Image to text
  • Model Stats:
    • Model checkpoint: trocr-small-stage1
    • Input resolution: 320x320
    • Number of parameters (TrOCREncoder): 23.0M
    • Model size (TrOCREncoder): 87.8 MB
    • Number of parameters (TrOCRDecoder): 38.3M
    • Model size (TrOCRDecoder): 146 MB
Model Device Chipset Target Runtime Inference Time (ms) Peak Memory Range (MB) Precision Primary Compute Unit Target Model
TrOCRDecoder Samsung Galaxy S23 Snapdragon® 8 Gen 2 TFLITE 2.163 ms 0 - 34 MB FP16 NPU TrOCR.tflite
TrOCRDecoder Samsung Galaxy S23 Snapdragon® 8 Gen 2 QNN 2.044 ms 2 - 5 MB FP16 NPU TrOCR.so
TrOCRDecoder Samsung Galaxy S23 Snapdragon® 8 Gen 2 ONNX 2.593 ms 0 - 227 MB FP16 NPU TrOCR.onnx
TrOCRDecoder Samsung Galaxy S24 Snapdragon® 8 Gen 3 TFLITE 1.459 ms 0 - 130 MB FP16 NPU TrOCR.tflite
TrOCRDecoder Samsung Galaxy S24 Snapdragon® 8 Gen 3 QNN 1.517 ms 0 - 19 MB FP16 NPU TrOCR.so
TrOCRDecoder Samsung Galaxy S24 Snapdragon® 8 Gen 3 ONNX 1.984 ms 0 - 138 MB FP16 NPU TrOCR.onnx
TrOCRDecoder Snapdragon 8 Elite QRD Snapdragon® 8 Elite TFLITE 1.379 ms 0 - 62 MB FP16 NPU TrOCR.tflite
TrOCRDecoder Snapdragon 8 Elite QRD Snapdragon® 8 Elite QNN 1.246 ms 2 - 64 MB FP16 NPU Use Export Script
TrOCRDecoder Snapdragon 8 Elite QRD Snapdragon® 8 Elite ONNX 1.78 ms 1 - 67 MB FP16 NPU TrOCR.onnx
TrOCRDecoder SA7255P ADP SA7255P TFLITE 11.79 ms 0 - 59 MB FP16 NPU TrOCR.tflite
TrOCRDecoder SA7255P ADP SA7255P QNN 11.696 ms 7 - 14 MB FP16 NPU Use Export Script
TrOCRDecoder SA8255 (Proxy) SA8255P Proxy TFLITE 2.084 ms 0 - 30 MB FP16 NPU TrOCR.tflite
TrOCRDecoder SA8255 (Proxy) SA8255P Proxy QNN 2.075 ms 1 - 3 MB FP16 NPU Use Export Script
TrOCRDecoder SA8295P ADP SA8295P TFLITE 3.017 ms 0 - 52 MB FP16 NPU TrOCR.tflite
TrOCRDecoder SA8295P ADP SA8295P QNN 2.913 ms 7 - 21 MB FP16 NPU Use Export Script
TrOCRDecoder SA8650 (Proxy) SA8650P Proxy TFLITE 2.1 ms 0 - 30 MB FP16 NPU TrOCR.tflite
TrOCRDecoder SA8650 (Proxy) SA8650P Proxy QNN 2.321 ms 2 - 5 MB FP16 NPU Use Export Script
TrOCRDecoder SA8775P ADP SA8775P TFLITE 3.142 ms 0 - 59 MB FP16 NPU TrOCR.tflite
TrOCRDecoder SA8775P ADP SA8775P QNN 3.06 ms 7 - 17 MB FP16 NPU Use Export Script
TrOCRDecoder QCS8275 (Proxy) QCS8275 Proxy TFLITE 11.79 ms 0 - 59 MB FP16 NPU TrOCR.tflite
TrOCRDecoder QCS8275 (Proxy) QCS8275 Proxy QNN 11.696 ms 7 - 14 MB FP16 NPU Use Export Script
TrOCRDecoder QCS8550 (Proxy) QCS8550 Proxy TFLITE 2.064 ms 0 - 29 MB FP16 NPU TrOCR.tflite
TrOCRDecoder QCS8550 (Proxy) QCS8550 Proxy QNN 2.062 ms 1 - 4 MB FP16 NPU Use Export Script
TrOCRDecoder QCS9075 (Proxy) QCS9075 Proxy TFLITE 3.142 ms 0 - 59 MB FP16 NPU TrOCR.tflite
TrOCRDecoder QCS9075 (Proxy) QCS9075 Proxy QNN 3.06 ms 7 - 17 MB FP16 NPU Use Export Script
TrOCRDecoder QCS8450 (Proxy) QCS8450 Proxy TFLITE 2.658 ms 0 - 124 MB FP16 NPU TrOCR.tflite
TrOCRDecoder QCS8450 (Proxy) QCS8450 Proxy QNN 2.427 ms 4 - 126 MB FP16 NPU Use Export Script
TrOCRDecoder Snapdragon X Elite CRD Snapdragon® X Elite QNN 2.212 ms 7 - 7 MB FP16 NPU Use Export Script
TrOCRDecoder Snapdragon X Elite CRD Snapdragon® X Elite ONNX 2.389 ms 68 - 68 MB FP16 NPU TrOCR.onnx
TrOCREncoder Samsung Galaxy S23 Snapdragon® 8 Gen 2 TFLITE 37.307 ms 8 - 36 MB FP16 NPU TrOCR.tflite
TrOCREncoder Samsung Galaxy S23 Snapdragon® 8 Gen 2 QNN 38.013 ms 2 - 4 MB FP16 NPU TrOCR.so
TrOCREncoder Samsung Galaxy S23 Snapdragon® 8 Gen 2 ONNX 37.944 ms 14 - 125 MB FP16 NPU TrOCR.onnx
TrOCREncoder Samsung Galaxy S24 Snapdragon® 8 Gen 3 TFLITE 28.402 ms 6 - 172 MB FP16 NPU TrOCR.tflite
TrOCREncoder Samsung Galaxy S24 Snapdragon® 8 Gen 3 QNN 30.166 ms 2 - 21 MB FP16 NPU TrOCR.so
TrOCREncoder Samsung Galaxy S24 Snapdragon® 8 Gen 3 ONNX 30.119 ms 12 - 123 MB FP16 NPU TrOCR.onnx
TrOCREncoder Snapdragon 8 Elite QRD Snapdragon® 8 Elite TFLITE 26.738 ms 6 - 171 MB FP16 NPU TrOCR.tflite
TrOCREncoder Snapdragon 8 Elite QRD Snapdragon® 8 Elite QNN 22.69 ms 2 - 168 MB FP16 NPU Use Export Script
TrOCREncoder Snapdragon 8 Elite QRD Snapdragon® 8 Elite ONNX 25.977 ms 14 - 124 MB FP16 NPU TrOCR.onnx
TrOCREncoder SA7255P ADP SA7255P TFLITE 253.63 ms 4 - 167 MB FP16 NPU TrOCR.tflite
TrOCREncoder SA7255P ADP SA7255P QNN 249.787 ms 2 - 9 MB FP16 NPU Use Export Script
TrOCREncoder SA8255 (Proxy) SA8255P Proxy TFLITE 37.287 ms 7 - 32 MB FP16 NPU TrOCR.tflite
TrOCREncoder SA8255 (Proxy) SA8255P Proxy QNN 38.012 ms 2 - 4 MB FP16 NPU Use Export Script
TrOCREncoder SA8295P ADP SA8295P TFLITE 51.229 ms 7 - 168 MB FP16 NPU TrOCR.tflite
TrOCREncoder SA8295P ADP SA8295P QNN 50.73 ms 2 - 16 MB FP16 NPU Use Export Script
TrOCREncoder SA8650 (Proxy) SA8650P Proxy TFLITE 37.283 ms 7 - 35 MB FP16 NPU TrOCR.tflite
TrOCREncoder SA8650 (Proxy) SA8650P Proxy QNN 37.826 ms 2 - 4 MB FP16 NPU Use Export Script
TrOCREncoder SA8775P ADP SA8775P TFLITE 45.975 ms 7 - 170 MB FP16 NPU TrOCR.tflite
TrOCREncoder SA8775P ADP SA8775P QNN 43.717 ms 2 - 12 MB FP16 NPU Use Export Script
TrOCREncoder QCS8275 (Proxy) QCS8275 Proxy TFLITE 253.63 ms 4 - 167 MB FP16 NPU TrOCR.tflite
TrOCREncoder QCS8275 (Proxy) QCS8275 Proxy QNN 249.787 ms 2 - 9 MB FP16 NPU Use Export Script
TrOCREncoder QCS8550 (Proxy) QCS8550 Proxy TFLITE 37.453 ms 7 - 32 MB FP16 NPU TrOCR.tflite
TrOCREncoder QCS8550 (Proxy) QCS8550 Proxy QNN 37.864 ms 2 - 4 MB FP16 NPU Use Export Script
TrOCREncoder QCS9075 (Proxy) QCS9075 Proxy TFLITE 45.975 ms 7 - 170 MB FP16 NPU TrOCR.tflite
TrOCREncoder QCS9075 (Proxy) QCS9075 Proxy QNN 43.717 ms 2 - 12 MB FP16 NPU Use Export Script
TrOCREncoder QCS8450 (Proxy) QCS8450 Proxy TFLITE 46.089 ms 7 - 172 MB FP16 NPU TrOCR.tflite
TrOCREncoder QCS8450 (Proxy) QCS8450 Proxy QNN 47.168 ms 2 - 171 MB FP16 NPU Use Export Script
TrOCREncoder Snapdragon X Elite CRD Snapdragon® X Elite QNN 35.409 ms 2 - 2 MB FP16 NPU Use Export Script
TrOCREncoder Snapdragon X Elite CRD Snapdragon® X Elite ONNX 36.327 ms 51 - 51 MB FP16 NPU TrOCR.onnx

Installation

Install the package via pip:

pip install "qai-hub-models[trocr]"

Configure Qualcomm® AI Hub to run this model on a cloud-hosted device

Sign-in to Qualcomm® AI Hub with your Qualcomm® ID. Once signed in navigate to Account -> Settings -> API Token.

With this API token, you can configure your client to run models on the cloud hosted devices.

qai-hub configure --api_token API_TOKEN

Navigate to docs for more information.

Demo off target

The package contains a simple end-to-end demo that downloads pre-trained weights and runs this model on a sample input.

python -m qai_hub_models.models.trocr.demo

The above demo runs a reference implementation of pre-processing, model inference, and post processing.

NOTE: If you want running in a Jupyter Notebook or Google Colab like environment, please add the following to your cell (instead of the above).

%run -m qai_hub_models.models.trocr.demo

Run model on a cloud-hosted device

In addition to the demo, you can also run the model on a cloud-hosted Qualcomm® device. This script does the following:

  • Performance check on-device on a cloud-hosted device
  • Downloads compiled assets that can be deployed on-device for Android.
  • Accuracy check between PyTorch and on-device outputs.
python -m qai_hub_models.models.trocr.export
Profiling Results
------------------------------------------------------------
TrOCRDecoder
Device                          : Samsung Galaxy S23 (13)
Runtime                         : TFLITE                 
Estimated inference time (ms)   : 2.2                    
Estimated peak memory usage (MB): [0, 34]                
Total # Ops                     : 399                    
Compute Unit(s)                 : NPU (399 ops)          

------------------------------------------------------------
TrOCREncoder
Device                          : Samsung Galaxy S23 (13)
Runtime                         : TFLITE                 
Estimated inference time (ms)   : 37.3                   
Estimated peak memory usage (MB): [8, 36]                
Total # Ops                     : 591                    
Compute Unit(s)                 : NPU (591 ops)          

How does this work?

This export script leverages Qualcomm® AI Hub to optimize, validate, and deploy this model on-device. Lets go through each step below in detail:

Step 1: Compile model for on-device deployment

To compile a PyTorch model for on-device deployment, we first trace the model in memory using the jit.trace and then call the submit_compile_job API.

import torch

import qai_hub as hub
from qai_hub_models.models.trocr import Model

# Load the model
model = Model.from_pretrained()
decoder_model = model.decoder
encoder_model = model.encoder

# Device
device = hub.Device("Samsung Galaxy S23")

# Trace model
decoder_input_shape = decoder_model.get_input_spec()
decoder_sample_inputs = decoder_model.sample_inputs()

traced_decoder_model = torch.jit.trace(decoder_model, [torch.tensor(data[0]) for _, data in decoder_sample_inputs.items()])

# Compile model on a specific device
decoder_compile_job = hub.submit_compile_job(
    model=traced_decoder_model ,
    device=device,
    input_specs=decoder_model.get_input_spec(),
)

# Get target model to run on-device
decoder_target_model = decoder_compile_job.get_target_model()
# Trace model
encoder_input_shape = encoder_model.get_input_spec()
encoder_sample_inputs = encoder_model.sample_inputs()

traced_encoder_model = torch.jit.trace(encoder_model, [torch.tensor(data[0]) for _, data in encoder_sample_inputs.items()])

# Compile model on a specific device
encoder_compile_job = hub.submit_compile_job(
    model=traced_encoder_model ,
    device=device,
    input_specs=encoder_model.get_input_spec(),
)

# Get target model to run on-device
encoder_target_model = encoder_compile_job.get_target_model()

Step 2: Performance profiling on cloud-hosted device

After compiling models from step 1. Models can be profiled model on-device using the target_model. Note that this scripts runs the model on a device automatically provisioned in the cloud. Once the job is submitted, you can navigate to a provided job URL to view a variety of on-device performance metrics.

decoder_profile_job = hub.submit_profile_job(
    model=decoder_target_model,
    device=device,
)
encoder_profile_job = hub.submit_profile_job(
    model=encoder_target_model,
    device=device,
)

Step 3: Verify on-device accuracy

To verify the accuracy of the model on-device, you can run on-device inference on sample input data on the same cloud hosted device.

decoder_input_data = decoder_model.sample_inputs()
decoder_inference_job = hub.submit_inference_job(
    model=decoder_target_model,
    device=device,
    inputs=decoder_input_data,
)
decoder_inference_job.download_output_data()
encoder_input_data = encoder_model.sample_inputs()
encoder_inference_job = hub.submit_inference_job(
    model=encoder_target_model,
    device=device,
    inputs=encoder_input_data,
)
encoder_inference_job.download_output_data()

With the output of the model, you can compute like PSNR, relative errors or spot check the output with expected output.

Note: This on-device profiling and inference requires access to Qualcomm® AI Hub. Sign up for access.

Deploying compiled model to Android

The models can be deployed using multiple runtimes:

  • TensorFlow Lite (.tflite export): This tutorial provides a guide to deploy the .tflite model in an Android application.

  • QNN (.so export ): This sample app provides instructions on how to use the .so shared library in an Android application.

View on Qualcomm® AI Hub

Get more details on TrOCR's performance across various devices here. Explore all available models on Qualcomm® AI Hub

License

  • The license for the original implementation of TrOCR can be found here.
  • The license for the compiled assets for on-device deployment can be found here

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-to-text models for pytorch library.