Update usage example with infinity

#13
docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \
michaelf34/infinity:0.0.68 \
v2 --model-id Alibaba-NLP/gte-base-en-v1.5 --revision "main" --dtype bfloat16 --batch-size 32 --device cuda --engine torch --port 7997
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2024-11-12 23:40:58,030 infinity_emb INFO:        infinity_server.py:89
         Creating 1engines:                                                     
         engines=['Alibaba-NLP/gte-base-en-v1.5']                               
INFO     2024-11-12 23:40:58,035 infinity_emb INFO: Anonymized   telemetry.py:30
         telemetry can be disabled via environment variable                     
         `DO_NOT_TRACK=1`.                                                      
INFO     2024-11-12 23:40:58,042 infinity_emb INFO:           select_model.py:64
         model=`Alibaba-NLP/gte-base-en-v1.5` selected, using                   
         engine=`torch` and device=`cuda`                                       
INFO     2024-11-12 23:41:00,320                      SentenceTransformer.py:216
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         Alibaba-NLP/gte-base-en-v1.5                                           
A new version of the following files was downloaded from https://huggingface.co/Alibaba-NLP/new-impl:
- configuration.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Alibaba-NLP/new-impl:
- modeling.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


INFO     2024-11-12 23:43:33,218 infinity_emb INFO: Adding    acceleration.py:56
         optimizations via Huggingface optimum.                                 
The class `optimum.bettertransformers.transformation.BetterTransformer` is deprecated and will be removed in a future release.
WARNING  2024-11-12 23:43:33,220 infinity_emb WARNING:        acceleration.py:67
         BetterTransformer is not available for model: <class                   
         'transformers_modules.Alibaba-NLP.new-impl.40ced75c3                   
         017eb27626c9d4ea981bde21a2662f4.modeling.NewModel'>                    
         Continue without bettertransformer modeling code.                      
INFO     2024-11-12 23:43:33,469 infinity_emb INFO: Getting   select_model.py:97
         timings for batch_size=32 and avg tokens per                           
         sentence=1                                                             
                 3.29     ms tokenization                                       
                 6.17     ms inference                                          
                 0.14     ms post-processing                                    
                 9.60     ms total                                              
         embeddings/sec: 3332.34                                                
INFO     2024-11-12 23:43:33,674 infinity_emb INFO: Getting  select_model.py:103
         timings for batch_size=32 and avg tokens per                           
         sentence=512                                                           
                 16.20    ms tokenization                                       
                 71.20    ms inference                                          
                 0.21     ms post-processing                                    
                 87.61    ms total                                              
         embeddings/sec: 365.26              
thenlper changed pull request status to merged

Sign up or log in to comment