--- license: other pipeline_tag: visual-question-answering ---

InternLM-XComposer-2.5-OL

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
**InternLM-XComposer2.5-OL**, a comprehensive multimodal system for long-term streaming video and audio interactions. ### Import from Transformers To load the base LLM model using Transformers, use the following code: ```python import torch from transformers import AutoModel, AutoTokenizer torch.set_grad_enabled(False) # init model and tokenizer model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half() tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True) model.tokenizer = tokenizer ``` To load the base audio model using MS-Swift, use the following code: ```python import os os.environ['USE_HF'] = 'True' import torch from swift.llm import ( get_model_tokenizer, get_template, ModelType, get_default_template_type, inference ) from swift.utils import seed_everything model_type = ModelType.qwen2_audio_7b_instruct model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b' template_type = get_default_template_type(model_type) print(f'template_type: {template_type}') model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio', model_kwargs={'device_map': 'cuda:0'}) model.generation_config.max_new_tokens = 256 template = get_template(template_type, tokenizer) seed_everything(42) ``` ## Quickstart We provide simple examples below to show how to use InternLM-XComposer-2.5-OL with 🤗 Transformers. For complete guide, please refer to [here](https://github.com/InternLM/InternLM-XComposer/blob/main/InternLM-XComposer-2.5-OmniLive/examples/README.md).
Audio Understanding ```python import os os.environ['USE_HF'] = 'True' import torch from swift.llm import ( get_model_tokenizer, get_template, ModelType, get_default_template_type, inference ) from swift.utils import seed_everything model_type = ModelType.qwen2_audio_7b_instruct model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b' template_type = get_default_template_type(model_type) print(f'template_type: {template_type}') model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio', model_kwargs={'device_map': 'cuda:0'}) model.generation_config.max_new_tokens = 256 template = get_template(template_type, tokenizer) seed_everything(42) # Chinese ASR query = '
Image Understanding ```python import torch from transformers import AutoModel, AutoTokenizer torch.set_grad_enabled(False) # init model and tokenizer model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half() tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True) model.tokenizer = tokenizer query = 'Analyze the given image in a detail manner' image = ['examples/images/dubai.png'] with torch.autocast(device_type='cuda', dtype=torch.float16): response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True) print(response) ```
### Citation If you find Euclid useful for your research and applications, please cite using this BibTeX: ```bibtex @misc{zhang2024internlmxcomposer25omnilivecomprehensivemultimodallongterm, title={InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions}, author={Pan Zhang and Xiaoyi Dong and Yuhang Cao and Yuhang Zang and Rui Qian and Xilin Wei and Lin Chen and Yifei Li and Junbo Niu and Shuangrui Ding and Qipeng Guo and Haodong Duan and Xin Chen and Han Lv and Zheng Nie and Min Zhang and Bin Wang and Wenwei Zhang and Xinyue Zhang and Jiaye Ge and Wei Li and Jingwen Li and Zhongying Tu and Conghui He and Xingcheng Zhang and Kai Chen and Yu Qiao and Dahua Lin and Jiaqi Wang}, year={2024}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.09596}, } ``` ### Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.