internlm
/

internlm-xcomposer2d5-ol-7b

Visual Question Answering

Safetensors

internlm2

custom_code

Model card Files Files and versions Community

myownskyW7 commited on Dec 11, 2024

Commit

0dd2645

verified ·

1 Parent(s): 90ae98e

Update README.md

Browse files

Files changed (1) hide show

README.md +89 -3

README.md CHANGED Viewed

@@ -1,3 +1,89 @@
----
-license: apache-2.0
----

+---
+license: other
+pipeline_tag: visual-question-answering
+---
+<p align="center">
+    <img src="assets/logo_en.png" width="650"/>
+</p>
+<p align="center">
+    <b><font size="6">InternLM-XComposer 2.5 OmniLive</font></b>
+</p>
+[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
+**InternLM-XComposer2.5-OL**, a specialized generalist multimodal system for streaming video and audio interactions.
+<div align="center">
+        InternLM-XComposer2.5-OmniLive <a href="https://huggingface.co/internlm/internlm-xcomposer2d5-ol-7b">🤗</a> <a href="https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2d5-ol-7b"><img src="../assets/modelscope_logo.png" width="20px"></a> &nbsp｜ XComposer2.5 OmniLive Technical Report <a href="https://arxiv.org/abs/2407.03320">  📄 </a>
+## Quickstart
+We provide simple examples below to show how to use InternLM-XComposer-2.5-OL with 🤗 Transformers. For complete guide, please refer to [here](examples/README.md).
+<details>
+  <summary>
+    <b>Audio Understanding</b>
+  </summary>
+```python
+import os
+os.environ['USE_HF'] = 'True'
+import torch
+from swift.llm import (
+    get_model_tokenizer, get_template, ModelType,
+    get_default_template_type, inference
+)
+from swift.utils import seed_everything
+model_type = ModelType.qwen2_audio_7b_instruct
+model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b'
+template_type = get_default_template_type(model_type)
+print(f'template_type: {template_type}')
+model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio',
+                                       model_kwargs={'device_map': 'cuda:0'})
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer)
+seed_everything(42)
+# Chinese ASR
+query = '<audio>Detect the language and recognize the speech.'
+response, _ = inference(model, template, query, audios='examples/audios/chinese.mp3')
+print(f'query: {query}')
+print(f'response: {response}')
+```
+</details>
+<details>
+  <summary>
+    <b>Image Understanding</b>
+  </summary>
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+torch.set_grad_enabled(False)
+# init model and tokenizer
+model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
+tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True)
+model.tokenizer = tokenizer
+query = 'Analyze the given image in a detail manner'
+image = ['examples/images/dubai.png']
+with torch.autocast(device_type='cuda', dtype=torch.float16):
+    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
+print(response)
+```
+</details>
+### Open Source License
+The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact internlm@pjlab.org.cn.