--- license: other pipeline_tag: visual-question-answering ---

InternLM-XComposer-2.5-OL

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
**InternLM-XComposer2.5-OL**, a comprehensive multimodal system for long-term streaming video and audio interactions. ### Import from Transformers To load the base LLM model using Transformers, use the following code: ```python import torch from transformers import AutoModel, AutoTokenizer torch.set_grad_enabled(False) # init model and tokenizer model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half() tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True) model.tokenizer = tokenizer ``` To load the base audio model using MS-Swift, use the following code: ```python import os os.environ['USE_HF'] = 'True' import torch from swift.llm import ( get_model_tokenizer, get_template, ModelType, get_default_template_type, inference ) from swift.utils import seed_everything model_type = ModelType.qwen2_audio_7b_instruct model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b' template_type = get_default_template_type(model_type) print(f'template_type: {template_type}') model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio', model_kwargs={'device_map': 'cuda:0'}) model.generation_config.max_new_tokens = 256 template = get_template(template_type, tokenizer) seed_everything(42) ``` ## Quickstart We provide simple examples below to show how to use InternLM-XComposer-2.5-OL with 🤗 Transformers. For complete guide, please refer to [here](examples/README.md).
Audio Understanding ```python import os os.environ['USE_HF'] = 'True' import torch from swift.llm import ( get_model_tokenizer, get_template, ModelType, get_default_template_type, inference ) from swift.utils import seed_everything model_type = ModelType.qwen2_audio_7b_instruct model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b' template_type = get_default_template_type(model_type) print(f'template_type: {template_type}') model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio', model_kwargs={'device_map': 'cuda:0'}) model.generation_config.max_new_tokens = 256 template = get_template(template_type, tokenizer) seed_everything(42) # Chinese ASR query = '
Image Understanding ```python import torch from transformers import AutoModel, AutoTokenizer torch.set_grad_enabled(False) # init model and tokenizer model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half() tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True) model.tokenizer = tokenizer query = 'Analyze the given image in a detail manner' image = ['examples/images/dubai.png'] with torch.autocast(device_type='cuda', dtype=torch.float16): response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True) print(response) ```
### Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.