Update infinity example
#23
by
michaelfeil
- opened
Added:
infinity_emb
Usage via infinity, MIT Licensed.
docker run \
--gpus "0" -p "7997":"7997" \
michaelf34/infinity:latest \
v2 --model-id dunzhang/stella_en_400M_v5 --revision "refs/pr/24" --dtype bfloat16 --batch-size 16 --device cuda --engine torch --port 7997 --no-bettertransformer
michaelfeil
changed pull request status to
closed
michaelfeil
changed pull request status to
open
michaelfeil
changed pull request status to
closed
michaelfeil
changed pull request status to
open
docker run --gpus "0" -p "7997":"7997" michaelf34/infinity:latest v2 --model-id dunzhang/stella_en_400M_v5 --revision "refs/pr/24" --dtype bfloat16 --batch-size 16 --device cuda --engine torch --port 7997 --no-bettertransformer
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO 2024-11-14 05:18:36,657 infinity_emb INFO: infinity_server.py:89
Creating 1engines:
engines=['dunzhang/stella_en_400M_v5']
INFO 2024-11-14 05:18:36,662 infinity_emb INFO: Anonymized telemetry.py:30
telemetry can be disabled via environment variable
`DO_NOT_TRACK=1`.
INFO 2024-11-14 05:18:36,670 infinity_emb INFO: select_model.py:64
model=`dunzhang/stella_en_400M_v5` selected, using
engine=`torch` and device=`cuda`
INFO 2024-11-14 05:18:36,936 SentenceTransformer.py:216
sentence_transformers.SentenceTransformer
INFO: Load pretrained SentenceTransformer:
dunzhang/stella_en_400M_v5
Some weights of the model checkpoint at dunzhang/stella_en_400M_v5 were not used when initializing NewModel: ['new.pooler.dense.bias', 'new.pooler.dense.weight']
- This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
INFO 2024-11-14 05:19:21,174 SentenceTransformer.py:355
sentence_transformers.SentenceTransformer
INFO: 2 prompts are loaded, with the keys:
['s2p_query', 's2s_query']
/app/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py:1141: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
INFO 2024-11-14 05:19:21,795 infinity_emb INFO: Getting select_model.py:97
timings for batch_size=16 and avg tokens per
sentence=1
4.10 ms tokenization
23.05 ms inference
0.17 ms post-processing
27.32 ms total
embeddings/sec: 585.72
INFO 2024-11-14 05:19:23,600 infinity_emb INFO: Getting select_model.py:103
timings for batch_size=16 and avg tokens per
sentence=512
12.33 ms tokenization
906.90 ms inference
0.48 ms post-processing
919.72 ms total
embeddings/sec: 17.40
INFO 2024-11-14 05:19:23,604 infinity_emb INFO: model select_model.py:104
warmed up, between 17.40-585.72 embeddings/sec at
batch_size=16
INFO 2024-11-14 05:19:23,607 infinity_emb INFO: batch_handler.py:386
creating batching engine
INFO 2024-11-14 05:19:23,609 infinity_emb INFO: ready batch_handler.py:453
to batch requests.
INFO 2024-11-14 05:19:23,613 infinity_emb INFO: infinity_server.py:104
♾️ Infinity - Embedding Inference Server
MIT License; Copyright (c) 2023-now Michael Feil
Version 0.0.69
Open the Docs via Swagger UI:
http://0.0.0.0:7997/docs
Access all deployed models via 'GET':
curl http://0.0.0.0:7997/models
Visit the docs for more information:
https://michaelfeil.github.io/infinity
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)
infgrad
changed pull request status to
merged