Overview

Hi, I am trying to inference the "Efficient-Large-Model/NVILA-8B-Video" model but have run into issue. I have the Llava lib also install locally.
transformers 4.46.0
llava 1.2.2.post1 /home/ubuntu/cluster_exp/LLaVA

Error:

Traceback (most recent call last):
File "/home/ubuntu/cluster_exp/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1034, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/home/ubuntu/cluster_exp/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 736, in getitem
raise KeyError(key)
KeyError: 'llava_llama'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/cluster_exp/vila_extract_embeddings.py", line 17, in
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).half().cuda().to(torch.bfloat16)
File "/home/ubuntu/cluster_exp/.venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/ubuntu/cluster_exp/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1036, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

#############################

Code:

import torch
from transformers import AutoModel

BATCH_SIZE = 1 # Process one video at a time
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_PATH = "Efficient-Large-Model/NVILA-8B-Video"

model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).half().cuda().to(torch.bfloat16)
model.eval()

print(dir(model))

Efficient-Large-Model
/

NVILA-8B-Video

Issue with inferencing "Efficient-Large-Model/NVILA-8B-Video"

Overview

Error:

Code: