update readme for instructions for usage with infinity

#39

Please merge this PR for documentation update.

Launched on A100-40G, 32GB usage, batch-size=16

INFO     2024-11-12 22:13:40,975 infinity_emb INFO:        infinity_server.py:89
         Creating 1engines:                                                     
         engines=['Alibaba-NLP/gte-Qwen2-7B-instruct']                          
INFO     2024-11-12 22:13:40,979 infinity_emb INFO: Anonymized   telemetry.py:30
         telemetry can be disabled via environment variable                     
         `DO_NOT_TRACK=1`.                                                      
INFO     2024-11-12 22:13:40,987 infinity_emb INFO:           select_model.py:64
         model=`Alibaba-NLP/gte-Qwen2-7B-instruct` selected,                    
         using engine=`torch` and device=`cuda`                                 
INFO     2024-11-12 22:13:41,188                      SentenceTransformer.py:216
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         Alibaba-NLP/gte-Qwen2-7B-instruct                                      

INFO     2024-11-12 22:41:25,069                      SentenceTransformer.py:355
         sentence_transformers.SentenceTransformer                              
         INFO: 1 prompts are loaded, with the keys:                             
         ['query']                                                              
INFO     2024-11-12 22:41:26,143 infinity_emb INFO: Getting   select_model.py:97
         timings for batch_size=16 and avg tokens per                           
         sentence=2                                                             
                 2.64     ms tokenization                                       
                 32.47    ms inference                                          
                 0.25     ms post-processing                                    
                 35.36    ms total                                              
         embeddings/sec: 452.54                                                 
INFO     2024-11-12 22:41:27,721 infinity_emb INFO: Getting  select_model.py:103
         timings for batch_size=16 and avg tokens per                           
         sentence=513                                                           
                 7.76     ms tokenization                                       
                 765.84   ms inference                                          
                 0.53     ms post-processing                                    
                 774.13   ms total                                              
         embeddings/sec: 20.67   
thenlper changed pull request status to merged

Sign up or log in to comment