myownskyW7
commited on
Commit
•
0dd2645
1
Parent(s):
90ae98e
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,89 @@
|
|
1 |
-
---
|
2 |
-
license:
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
pipeline_tag: visual-question-answering
|
4 |
+
---
|
5 |
+
|
6 |
+
<p align="center">
|
7 |
+
<img src="assets/logo_en.png" width="650"/>
|
8 |
+
</p>
|
9 |
+
<p align="center">
|
10 |
+
<b><font size="6">InternLM-XComposer 2.5 OmniLive</font></b>
|
11 |
+
</p>
|
12 |
+
|
13 |
+
[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
|
14 |
+
|
15 |
+
**InternLM-XComposer2.5-OL**, a specialized generalist multimodal system for streaming video and audio interactions.
|
16 |
+
|
17 |
+
<div align="center">
|
18 |
+
InternLM-XComposer2.5-OmniLive <a href="https://huggingface.co/internlm/internlm-xcomposer2d5-ol-7b">🤗</a> <a href="https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2d5-ol-7b"><img src="../assets/modelscope_logo.png" width="20px"></a>  | XComposer2.5 OmniLive Technical Report <a href="https://arxiv.org/abs/2407.03320"> 📄 </a>
|
19 |
+
|
20 |
+
|
21 |
+
## Quickstart
|
22 |
+
|
23 |
+
We provide simple examples below to show how to use InternLM-XComposer-2.5-OL with 🤗 Transformers. For complete guide, please refer to [here](examples/README.md).
|
24 |
+
|
25 |
+
|
26 |
+
<details>
|
27 |
+
<summary>
|
28 |
+
<b>Audio Understanding</b>
|
29 |
+
</summary>
|
30 |
+
|
31 |
+
```python
|
32 |
+
import os
|
33 |
+
os.environ['USE_HF'] = 'True'
|
34 |
+
|
35 |
+
import torch
|
36 |
+
from swift.llm import (
|
37 |
+
get_model_tokenizer, get_template, ModelType,
|
38 |
+
get_default_template_type, inference
|
39 |
+
)
|
40 |
+
from swift.utils import seed_everything
|
41 |
+
|
42 |
+
model_type = ModelType.qwen2_audio_7b_instruct
|
43 |
+
model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b'
|
44 |
+
template_type = get_default_template_type(model_type)
|
45 |
+
print(f'template_type: {template_type}')
|
46 |
+
|
47 |
+
model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio',
|
48 |
+
model_kwargs={'device_map': 'cuda:0'})
|
49 |
+
model.generation_config.max_new_tokens = 256
|
50 |
+
template = get_template(template_type, tokenizer)
|
51 |
+
seed_everything(42)
|
52 |
+
|
53 |
+
# Chinese ASR
|
54 |
+
query = '<audio>Detect the language and recognize the speech.'
|
55 |
+
response, _ = inference(model, template, query, audios='examples/audios/chinese.mp3')
|
56 |
+
print(f'query: {query}')
|
57 |
+
print(f'response: {response}')
|
58 |
+
```
|
59 |
+
|
60 |
+
</details>
|
61 |
+
|
62 |
+
|
63 |
+
<details>
|
64 |
+
<summary>
|
65 |
+
<b>Image Understanding</b>
|
66 |
+
</summary>
|
67 |
+
|
68 |
+
```python
|
69 |
+
import torch
|
70 |
+
from transformers import AutoModel, AutoTokenizer
|
71 |
+
|
72 |
+
torch.set_grad_enabled(False)
|
73 |
+
|
74 |
+
# init model and tokenizer
|
75 |
+
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
|
76 |
+
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True)
|
77 |
+
model.tokenizer = tokenizer
|
78 |
+
|
79 |
+
query = 'Analyze the given image in a detail manner'
|
80 |
+
image = ['examples/images/dubai.png']
|
81 |
+
with torch.autocast(device_type='cuda', dtype=torch.float16):
|
82 |
+
response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
|
83 |
+
print(response)
|
84 |
+
```
|
85 |
+
|
86 |
+
</details>
|
87 |
+
|
88 |
+
### Open Source License
|
89 |
+
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.
|