Jun 2, 2024

I have 3 GPUs, an NVIDIA 4070 TI (12GB), NVIDIA 4060 TI (16GB), and an NVIDIA Tesla T4 (16GB) and I can't get it to split using this:
"
from transformers import AutoModel
from torch.nn import DataParallel

embedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v1")
for module_key, module in embedding_model._modules.items():
embedding_model._modules[module_key] = DataParallel(module)
"
and changing the batch size using this:
"

get the embeddings with DataLoader (spliting the datasets into multiple mini-batches)

batch_size=2
query_embeddings = model._do_encode(queries, batch_size=batch_size, instruction=query_prefix, max_length=max_length)
passage_embeddings = model._do_encode(passages, batch_size=batch_size, instruction=passage_prefix, max_length=max_length)
"
and setting the max embedding length to 512 still causes OOM on both the 4070 TI and the 4060 TI. So how much VRAM does this model need and what can I do to run it on my system?

Riverhousepresents

Jun 5, 2024

This comment has been hidden

bobwhiterabbit

Jun 5, 2024

You will realise it only loaded on one gpu, not the 3. That's why you get oom error.

yijiu

Jun 10, 2024

How could we load it into multiple GPUs, or can it not be originally?

nvidia
/

NV-Embed-v1

How much VRAM is needed to run this model? Like for the bare minimum length etc?

get the embeddings with DataLoader (spliting the datasets into multiple mini-batches)