vllm-inference / download_model.py

Commit History

feat(add-model): always download model during build, it will be cached in the consecutive builds
8679a35

yusufs commited on

feat(reduce-max-num-batched-tokens): Reducing max-num-batched-tokens even the error state it want to reduce max_model_len
13a5c22

yusufs commited on

feat(hf_token): set hf token during build
493a5f1

yusufs commited on

feat(download-model): add download model at runtime
fc30f26

yusufs commited on