vlm-demo / README.md
abalakrishnaTRI's picture
fix README
bb834c6
|
raw
history blame
2.95 kB

VLM Demo

VLM Demo: Lightweight repo for chatting with models loaded into VLM Bench.


Installation

This repository can be installed as follows:

git clone git@github.com:TRI-ML/vlm-demo.git
cd vlm-demo
pip install -e .

This repository also requires that the vlm-bench package (vlbench) and prismatic-vlms package (prisma) are installed in the current environment. These can both be installed from source from the following git repos:

  • vlm-bench: https://github.com/TRI-ML/vlm-bench
  • prismatic-vlms: https://github.com/TRI-ML/prismatic-vlms

Usage

The main script to run is interactive_demo.py, while the implementation of the Gradio Controller (serve/gradio_controller.py) and Gradio Web Server (serve/gradio_web_server.py) are within serve. All of this code is heavily adapted from the LLaVA Github Repo:. More details on how this code was modified from the original LLaVA repo is provided in the relevant source files.

To run the demo, run the following commands:

  • Start Gradio Controller: python -m serve.controller --host 0.0.0.0 --port 10000
  • Start Gradio Web Server: python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share
  • Run interactive demo: CUDA_VISIBLE_DEVICES=0 python -m interactive_demo --port 40000 --model_dir <PATH TO MODEL CKPT>

When running the demo, the following parameters are adjustable:

  • Temperature
  • Max output tokens

The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other interaction modes for more specific use cases:

  • Captioning: Here, you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt is input by the user, it will not be used in producing the caption.
  • Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired in the prompt and the selected model will output corresponding coordinates.
  • Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the prompt.
  • True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the prompt.

Contributing

Before committing to the repository, make sure to set up your dev environment!

Here are the basic development environment setup guidelines:

  • Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies (e.g., pip install -e ".[dev]"); this will install black, ruff, and pre-commit.

  • Install pre-commit hooks (pre-commit install).

  • Branch for the specific feature/issue, issuing PR against the upstream repository for review.