license: llama2
SEED Multimodal
Powered by CV Center, Tencent AI Lab, and ARC Lab, Tencent PCG.
Usage
Dependencies
- Python >= 3.8 (Recommend to use Anaconda)
- PyTorch >= 1.11.0
- NVIDIA GPU + CUDA
Installation
Clone repo
git clone https://github.com/AILab-CVC/SEED.git cd SEED
Install dependent packages
pip install -r requirements.txt
Model Weights
We provide the pretrained SEED Tokenizer and De-Tokenizer, instruction tuned SEED-LLaMA-8B and SEED-LLaMA-14B.
Please download the checkpoints and save under the folder ./pretrained
.
To reconstruct the image from the SEED visual codes using unCLIP SD-UNet, please download the pretrained unCLIP SD. Rename the checkpoint directory to "diffusion_model" and create a soft link to the "pretrained/seed_tokenizer" directory.
Inference for visual tokenization and de-tokenization
To discretize an image to 1D visual codes with causal dependency, and reconstruct the image from the visual codes using the off-the-shelf unCLIP SD-UNet:
python scripts/seed_tokenizer_inference.py
Launching Demo of SEED-LLaMA Locally
sh start_backend.sh
sh start_frontend.sh
Citation
If you find the work helpful, please consider citing:
@article{ge2023making,
title={Making LLaMA SEE and Draw with SEED Tokenizer},
author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying},
journal={arXiv preprint arXiv:2310.01218},
year={2023}
}
@article{ge2023planting,
title={Planting a seed of vision in large language model},
author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying},
journal={arXiv preprint arXiv:2307.08041},
year={2023}
}
The project is still in progress. Stay tuned for more updates!
License
SEED
is released under Apache License Version 2.0.
SEED-LLaMA
is released under the original License of LLaMA2.