Ragged Batching Support

#26
by jholm117 - opened

Hello and thanks for the awesome model!

I am running the onnx format of this model in Triton Server and I am trying to enable ragged batching. From what I can tell ragged batching requires triton to pass an additional input to the model which indicates to the model how to accurately splice apart the input for each batched request since they all get concatenated together into a single tensor.

But when I inspect the onnx file, it appears there are only three available inputs: input_ids, attention_mask, and token_type_ids. Does this mean ragged batching is not supported by this model? And if so, is that on the roadmap by any chance?

Thanks!

Sign up or log in to comment