Transformers
PyTorch
English
Inference Endpoints

some issues

#1
by loveisp - opened

When I try to load the model using VLLM, it consumes all of my memory (128G) and throws an Out-of-Memory (OOM) error. The pipeline from transformers library can be used, but the inference results are abnormal.

TIGER-Lab org

Is this OOM for CPU or GPU? It should work fine. I have the inference code in https://github.com/TIGER-AI-Lab/MAmmoTH/blob/main/requirements.txt.

It is CPU OOM. The memory consumption keeps growing and eventually consumes all of my 128GB memory, which shouldn't be the case. Other models from Mammoth don't exhibit this issue.

TIGER-Lab org

I see. Normally inference won't take that much memory. Can you confirm whether it's from vllm or from the mistral model itself (huggingface transformers)? I think these two would be mainly the sources.

I tried using huggingface's transformers instead of vllm, and I did encounter the same out-of-memory issue.

Sign up or log in to comment