vllm-inference / main.py

Commit History

feat(parse): parse output
b44271e

yusufs commited on

feat(response): should dict only
b41be20

yusufs commited on

feat(one-model): one model at a time
35decf8

yusufs commited on

fix(remove): use_cached_output is not an option
6b1968a

yusufs commited on

feat(max_model_len): reducing max_model_len for T4 support
c41cdb4

yusufs commited on

fix(half-precision): use half precision for T4
d51e450

yusufs commited on

fix(tensor_parallel_size): set to 1
84c6c4a

yusufs commited on

feat(cuda): add cuda information
2457cd7

yusufs commited on

fix(remove-params): Removing max_model_len
0ef012d

yusufs commited on

feat(sailor-chat): add sail/Sailor-4B-Chat with the same context length
586265c

yusufs commited on

feat(reduce-max-length): reduce maximum length
2425953

yusufs commited on

feat(t4-gpu): add t4 gpu capability
4998ce7

yusufs commited on

feat(first-commit): follow examples and tutorials
ae7cfbb

yusufs commited on