DAMO-NLP-SG
/

VideoLLaMA2-8x7B

Visual Question Answering

videollama2_mixtral

text-generation

multimodal large language model

large video-language model

Inference Endpoints

Model card Files Files and versions Community

Sicong commited on Jul 29

Commit

4106b0c

•

1 Parent(s): 73c0164

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -81,7 +81,7 @@ def inference():
     # The woman in the image is wearing a black coat and sunglasses, and she is walking down a rain-soaked city street. The image feels vibrant and lively, with the bright city lights reflecting off the wet pavement, creating a visually appealing atmosphere. The woman's presence adds a sense of style and confidence to the scene, as she navigates the bustling urban environment.
     modal_list = ['image']
     # 1. Initialize the model.
-    model_path = 'DAMO-NLP-SG/VideoLLaMA2-7B-Base'
     model_name = get_model_name_from_path(model_path)
     tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name)
     model = model.to('cuda:0')

     # The woman in the image is wearing a black coat and sunglasses, and she is walking down a rain-soaked city street. The image feels vibrant and lively, with the bright city lights reflecting off the wet pavement, creating a visually appealing atmosphere. The woman's presence adds a sense of style and confidence to the scene, as she navigates the bustling urban environment.
     modal_list = ['image']
     # 1. Initialize the model.
+    model_path = 'DAMO-NLP-SG/VideoLLaMA2-8x7B'
     model_name = get_model_name_from_path(model_path)
     tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name)
     model = model.to('cuda:0')