--- license: apache-2.0 pipeline_tag: text-generation ---

InternLM-XComposer2

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
**InternLM-XComposer2** is a vision-language large model (VLLM) based on [InternLM2](https://github.com/InternLM/InternLM) for advanced text-image comprehension and composition. We release InternLM-XComposer2 series in two versions: - InternLM-XComposer2-VL: The pretrained VLLM model with InternLM2 as the initialization of the LLM, achieving strong performance on various multimodal benchmarks. - InternLM-XComposer2: The finetuned VLLM for *Free-from Interleaved Text-Image Composition*. ### Import from Transformers To load the InternLM-XComposer2-VL-7B model using Transformers, use the following code: ```python import torch from PIL import image from transformers import AutoTokenizer, AutoModelForCausalLM ckpt_path = "internlm/internlm-xcomposer2-vl-7b" tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda() # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error. model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() model = model.eval() ``` ### 通过 Transformers 加载 通过以下的代码加载 InternLM-XComposer2-VL-7B 模型 ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM ckpt_path = "internlm/internlm-xcomposer2-vl-7b" tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda() # `torch_dtype=torch.float16` 可以令模型以 float16 精度加载,否则 transformers 会将模型加载为 float32,导致显存不足 model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() model = model.eval() ```