Spaces:
Running
on
Zero
Running
on
Zero
Metadata-Version: 2.1 | |
Name: nvidia-pytriton | |
Version: 0.4.2 | |
Summary: PyTriton - Flask/FastAPI-like interface to simplify Triton's deployment in Python environments. | |
License: Apache 2.0 | |
Project-URL: Documentation, https://triton-inference-server.github.io/pytriton | |
Project-URL: Source, https://github.com/triton-inference-server/pytriton | |
Project-URL: Tracker, https://github.com/triton-inference-server/pytriton/issues | |
Classifier: Development Status :: 3 - Alpha | |
Classifier: Intended Audience :: Science/Research | |
Classifier: Intended Audience :: Developers | |
Classifier: Topic :: Software Development | |
Classifier: Topic :: Scientific/Engineering | |
Classifier: Programming Language :: Python | |
Classifier: Programming Language :: Python :: 3 | |
Classifier: Programming Language :: Python :: 3.8 | |
Classifier: Programming Language :: Python :: 3.9 | |
Classifier: Programming Language :: Python :: 3.10 | |
Classifier: Programming Language :: Python :: 3.11 | |
Classifier: Operating System :: Unix | |
Requires-Python: <4,>=3.8 | |
Description-Content-Type: text/x-rst | |
License-File: LICENSE | |
Requires-Dist: numpy~=1.21 | |
Requires-Dist: protobuf>=3.7.0 | |
Requires-Dist: pyzmq~=23.0 | |
Requires-Dist: sh~=1.14 | |
Requires-Dist: tritonclient[all]~=2.39 | |
Requires-Dist: typing_inspect~=0.6.0 | |
Requires-Dist: wrapt>=1.11.0 | |
Provides-Extra: test | |
Requires-Dist: pytest~=7.2; extra == "test" | |
Requires-Dist: pytest-codeblocks~=0.16; extra == "test" | |
Requires-Dist: pytest-mock~=3.8; extra == "test" | |
Requires-Dist: pytest-timeout~=2.1; extra == "test" | |
Requires-Dist: alt-pytest-asyncio~=0.7; extra == "test" | |
Requires-Dist: pytype!=2021.11.18,!=2022.2.17; extra == "test" | |
Requires-Dist: pre-commit>=2.20.0; extra == "test" | |
Requires-Dist: tox>=3.23.1; extra == "test" | |
Requires-Dist: tqdm>=4.64.1; extra == "test" | |
Requires-Dist: psutil~=5.1; extra == "test" | |
Requires-Dist: py-spy~=0.3; extra == "test" | |
Provides-Extra: doc | |
Requires-Dist: GitPython>=3.1.30; extra == "doc" | |
Requires-Dist: mike>=2.0.0; extra == "doc" | |
Requires-Dist: mkdocs-htmlproofer-plugin>=0.8.0; extra == "doc" | |
Requires-Dist: mkdocs-material>=8.5.6; extra == "doc" | |
Requires-Dist: mkdocstrings[python]>=0.19.0; extra == "doc" | |
Provides-Extra: dev | |
Requires-Dist: nvidia-pytriton[test]; extra == "dev" | |
Requires-Dist: nvidia-pytriton[doc]; extra == "dev" | |
Requires-Dist: black>=22.8; extra == "dev" | |
Requires-Dist: build<1.0.0,>=0.8; extra == "dev" | |
Requires-Dist: ipython>=7.16; extra == "dev" | |
Requires-Dist: isort>=5.10; extra == "dev" | |
Requires-Dist: pudb>=2022.1.3; extra == "dev" | |
Requires-Dist: pip>=21.3; extra == "dev" | |
Requires-Dist: twine>=4.0; extra == "dev" | |
.. | |
Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); | |
you may not use this file except in compliance with the License. | |
You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software | |
distributed under the License is distributed on an "AS IS" BASIS, | |
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
See the License for the specific language governing permissions and | |
limitations under the License. | |
PyTriton | |
========== | |
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments. | |
The library allows serving Machine Learning models directly from Python through | |
NVIDIA's `Triton Inference Server`_. | |
.. _Triton Inference Server: https://github.com/triton-inference-server | |
In PyTriton, as in Flask or FastAPI, you can define any Python function that executes a machine learning model prediction and exposes | |
it through an HTTP/gRPC API. PyTriton installs Triton Inference Server in your environment and uses it for handling | |
HTTP/gRPC requests and responses. Our library provides a Python API that allows attaching a Python function to Triton | |
and a communication layer to send/receive data between Triton and the function. This solution helps utilize the | |
performance features of Triton Inference Server, such as dynamic batching or response cache, without changing your model | |
environment. Thus, it improves the performance of running inference on GPU for models implemented in Python. The solution is | |
framework-agnostic and can be used along with frameworks like PyTorch, TensorFlow, or JAX. | |
Installation | |
-------------- | |
The package can be installed from `pypi`_ using: | |
.. _pypi: https://pypi.org/project/nvidia-pytriton/ | |
.. code-block:: text | |
pip install -U nvidia-pytriton | |
More details about installation can be found in the `documentation`_. | |
.. _documentation: https://triton-inference-server.github.io/pytriton/latest/installation/ | |
Example | |
--------- | |
The example presents how to run Python model in Triton Inference Server without need to change the current working | |
environment. In the example we are using a simple `Linear` PyTorch model. | |
The requirement for the example is to have installed PyTorch in your environment. You can do it running: | |
.. code-block:: text | |
pip install torch | |
In the next step define the `Linear` model: | |
.. code-block:: python | |
import torch | |
model = torch.nn.Linear(2, 3).to("cuda").eval() | |
Create a function for handling inference request: | |
.. code-block:: python | |
import numpy as np | |
from pytriton.decorators import batch | |
@batch | |
def infer_fn(**inputs: np.ndarray): | |
(input1_batch,) = inputs.values() | |
input1_batch_tensor = torch.from_numpy(input1_batch).to("cuda") | |
output1_batch_tensor = model(input1_batch_tensor) # Calling the Python model inference | |
output1_batch = output1_batch_tensor.cpu().detach().numpy() | |
return | |
In the next step, create the connection between the model and Triton Inference Server using the bind method: | |
.. code-block:: python | |
from pytriton.model_config import ModelConfig, Tensor | |
from pytriton.triton import Triton | |
# Connecting inference callback with Triton Inference Server | |
with Triton() as triton: | |
# Load model into Triton Inference Server | |
triton.bind( | |
model_name="Linear", | |
infer_func=infer_fn, | |
inputs=[ | |
Tensor(dtype=np.float32, shape=(-1,)), | |
], | |
outputs=[ | |
Tensor(dtype=np.float32, shape=(-1,)), | |
], | |
config=ModelConfig(max_batch_size=128) | |
) | |
Finally, serve the model with Triton Inference Server: | |
.. code-block:: python | |
from pytriton.triton import Triton | |
with Triton() as triton: | |
... # Load models here | |
triton.serve() | |
The `bind` method is creating a connection between Triton Inference Server and the `infer_fn` which handle | |
the inference queries. The `inputs` and `outputs` describe the model inputs and outputs that are exposed in | |
Triton. The config field allows more parameters for model deployment. | |
The `serve` method is blocking and at this point the application will wait for incoming HTTP/gRPC requests. From that | |
moment the model is available under name `Linear` in Triton server. The inference queries can be sent to | |
`localhost:8000/v2/models/Linear/infer` which are passed to the `infer_fn` function. | |
Links | |
------- | |
* Documentation: https://triton-inference-server.github.io/pytriton | |
* Source: https://github.com/triton-inference-server/pytriton | |
* Issues: https://github.com/triton-inference-server/pytriton/issues | |
* Changelog: https://github.com/triton-inference-server/pytriton/blob/main/CHANGELOG.md | |
* Known Issues: https://github.com/triton-inference-server/pytriton/blob/main/docs/known_issues.md | |
* Contributing: https://github.com/triton-inference-server/pytriton/blob/main/CONTRIBUTING.md | |