Spaces:
Running
on
CPU Upgrade
OOM when use vllm to accelerate compute
I'm running the following command on a device equipped with an RTX3090 24GB GPU:
lm_eval --model vllm --model_args pretrained=/root/autodl-tmp/llama3.1_8b/,max_model_len=2048,gpu_memory_utilization=0.7 --tasks leaderboard --batch_size auto --output_path c-eval-result --log_samples --apply_chat_template --fewshot_as_multiturn
However, I encounter an Out-of-Memory (OOM) error. Is there additional memory required beyond what is specified by the gpu_memory_utilization parameter?
Additionally, when I run the same command on an A100 40GB GPU, VLLM utilizes approximately 40 * 0.7 = 28GB of GPU memory, but in practice, the actual GPU memory usage is around 35GB. Given this information, can I perform inference with an RTX3090, or could there be an issue with my parameter settings?
Moreover, I've encountered similar errors using local-completions:
lm_eval --model local-completions --tasks leaderboard_gpqa --output_path result-lm-eval --log_samples --model_args model=02-models/llama3.1-8b-instruction/,base_url=http://127.0.0.1:8001/v1/completions,num_concurrent=4,max_retries=1,tokenized_requests=False,max_length=2048 --batch_size 15
Could you provide guidance on how to resolve these issues or optimize the configuration for the RTX3090? Thank you.
Hi @chenxiaobooo ,
Thank you for sharing your setup and configuration! It sounds like you’re encountering some complex memory management challenges. For targeted support with lm-eval configurations and memory optimization, I’d recommend opening an issue on the lm-evaluation-harness GitHub here:
https://github.com/EleutherAI/lm-evaluation-harness
The maintainers and contributors there can provide guidance specifically for that repository, especially as this question is connected to lm-evaluation-harness rather than the Leaderboard.
I'm closing this discussion, please feel free to ping me here in case of any questions or open a new one!