Spaces:

yusufs
/

vllm-inference

Paused

App Files Files

vllm-inference / main.py

Commit History

feat(parse): parse output

b44271e

yusufs commited on Nov 27, 2024

feat(response): should dict only

b41be20

yusufs commited on Nov 27, 2024

feat(one-model): one model at a time

35decf8

yusufs commited on Nov 27, 2024

fix(remove): use_cached_output is not an option

6b1968a

yusufs commited on Nov 27, 2024

feat(max_model_len): reducing max_model_len for T4 support

c41cdb4

yusufs commited on Nov 27, 2024

fix(half-precision): use half precision for T4

d51e450

yusufs commited on Nov 27, 2024

fix(tensor_parallel_size): set to 1

84c6c4a

yusufs commited on Nov 27, 2024

feat(cuda): add cuda information

2457cd7

yusufs commited on Nov 27, 2024

fix(remove-params): Removing max_model_len

0ef012d

yusufs commited on Nov 27, 2024

feat(sailor-chat): add sail/Sailor-4B-Chat with the same context length

586265c

yusufs commited on Nov 27, 2024

feat(reduce-max-length): reduce maximum length

2425953

yusufs commited on Nov 27, 2024

feat(t4-gpu): add t4 gpu capability

4998ce7

yusufs commited on Nov 27, 2024

feat(first-commit): follow examples and tutorials

ae7cfbb

yusufs commited on Nov 27, 2024

Commit History

feat(parse): parse output b44271e

feat(response): should dict only b41be20

feat(one-model): one model at a time 35decf8

fix(remove): use_cached_output is not an option 6b1968a

feat(max_model_len): reducing max_model_len for T4 support c41cdb4

fix(half-precision): use half precision for T4 d51e450

fix(tensor_parallel_size): set to 1 84c6c4a

feat(cuda): add cuda information 2457cd7

fix(remove-params): Removing max_model_len 0ef012d

feat(sailor-chat): add sail/Sailor-4B-Chat with the same context length 586265c

feat(reduce-max-length): reduce maximum length 2425953

feat(t4-gpu): add t4 gpu capability 4998ce7

feat(first-commit): follow examples and tutorials ae7cfbb

feat(parse): parse output

b44271e

feat(response): should dict only

b41be20

feat(one-model): one model at a time

35decf8

fix(remove): use_cached_output is not an option

6b1968a

feat(max_model_len): reducing max_model_len for T4 support

c41cdb4

fix(half-precision): use half precision for T4

d51e450

fix(tensor_parallel_size): set to 1

84c6c4a

feat(cuda): add cuda information

2457cd7

fix(remove-params): Removing max_model_len

0ef012d

feat(sailor-chat): add sail/Sailor-4B-Chat with the same context length

586265c

feat(reduce-max-length): reduce maximum length

2425953

feat(t4-gpu): add t4 gpu capability

4998ce7

feat(first-commit): follow examples and tutorials

ae7cfbb