Edit model card

Welcome to CLIP-as-service!

GitHub: clip-as-service

Docs: clip-as-service

CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.

โšก Fast: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks.

๐Ÿซ Elastic: Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing.

๐Ÿฅ Easy-to-use: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding.

๐Ÿ‘’ Modern: Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression.

๐Ÿฑ Integration: Smooth integration with neural search ecosystem including Jina and DocArray. Build cross-modal and multi-modal solutions in no time.

[*] with default config (single replica, PyTorch no JIT) on GeForce RTX 3090.

Try it!

Install

PyPI is the latest version.

Make sure you are using Python 3.7+. You can install the client and server independently. It is not required to install both: e.g. you can install clip_server on a GPU machine and clip_client on a local laptop.

Client

pip install clip-client

Server (PyTorch)

pip install clip-server

Server (ONNX)

pip install "clip_server[onnx]"

Server (TensorRT)

pip install nvidia-pyindex 
pip install "clip_server[tensorrt]"

Server on Google Colab

Quick check

After installing, you can run the following commands for a quick connectivity check.

Start the server

Start PyTorch Server

python -m clip_server

Start ONNX Server

python -m clip_server onnx-flow.yml

Start TensorRT Server

python -m clip_server tensorrt-flow.yml

At the first time starting the server, it will download the default pretrained model, which may take a while depending on your network speed. Then you will get the address information similar to the following:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๐Ÿ”— Endpoint โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚  ๐Ÿ”—     Protocol                   GRPC  โ”‚
โ”‚  ๐Ÿ         Local          0.0.0.0:51000  โ”‚
โ”‚  ๐Ÿ”’      Private    192.168.31.62:51000  โ”‚
|  ๐ŸŒ       Public   87.105.159.191:51000  |
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ  

This means the server is ready to serve. Note down the three addresses shown above, you will need them later.

Connect from client

Depending on the location of the client and server. You may use different IP addresses:
- Client and server are on the same machine: use local address, e.g. `0.0.0.0`
- Client and server are connected to the same router: use private network address, e.g. `192.168.3.62`
- Server is in public network: use public network address, e.g. `87.105.159.191`

Run the following Python script:

from clip_client import Client

c = Client('grpc://0.0.0.0:51000')
c.profile()

will give you:

 Roundtrip  16ms  100%
โ”œโ”€โ”€  Client-server network  8ms  49%
โ””โ”€โ”€  Server  8ms  51%
    โ”œโ”€โ”€  Gateway-CLIP network  2ms  25%
    โ””โ”€โ”€  CLIP model  6ms  75%
{'Roundtrip': 15.684750003856607, 'Client-server network': 7.684750003856607, 'Server': 8, 'Gateway-CLIP network': 2, 'CLIP model': 6}

It means the client and the server are now connected. Well done!

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .