finalf0 commited on
Commit
5cb9540
1 Parent(s): 3efec51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -65
README.md CHANGED
@@ -18,7 +18,7 @@ pipeline_tag: text-generation
18
 
19
  We combine the OmniLMM-12B and GPT-3.5 into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**.
20
 
21
-
22
  <table>
23
  <thead>
24
  <tr>
@@ -118,68 +118,6 @@ pipeline_tag: text-generation
118
  ## Demo
119
  Click here to try out the Demo of [OmniLMM-12B](http://120.92.209.146:8081).
120
 
121
- ## Install
122
-
123
- 1. Clone this repository and navigate to the source folder
124
-
125
- ```bash
126
- git clone https://github.com/OpenBMB/OmniLMM.git
127
- cd OmniLMM
128
- ```
129
-
130
- 2. Create conda environment
131
-
132
- ```Shell
133
- conda create -n OmniLMM python=3.10 -y
134
- conda activate OmniLMM
135
- ```
136
-
137
- 3. Install dependencies
138
-
139
- ```shell
140
- pip install -r requirements.txt
141
- ```
142
-
143
- ## Inference
144
-
145
-
146
-
147
- ### Multi-turn Conversation
148
- Please refer to the following codes to run `OmniLMM`.
149
-
150
- <div align="center">
151
- <img src="assets/COCO_test2015_000000262144.jpg" width="660px">
152
- </div>
153
-
154
- ##### OmniLMM-12B
155
- ```python
156
- from chat import OmniLMMChat, img2base64
157
-
158
- chat_model = OmniLMMChat('openbmb/OmniLMM-12B')
159
-
160
- im_64 = img2base64('./data/COCO_test2015_000000262144.jpg')
161
-
162
- # First round chat
163
- msgs = [{"role": "user", "content": "What are the people doing?"}]
164
-
165
- inputs = {"image": im_64, "question": json.dumps(msgs)}
166
- answer = chat_model.process(inputs)
167
- print(answer)
168
-
169
- # Second round chat
170
- # pass history context of multi-turn conversation
171
- msgs.append({"role": "assistant", "content": answer})
172
- msgs.append({"role": "user", "content": "Describe the image"})
173
-
174
- inputs = {"image": im_64, "question": json.dumps(msgs)}
175
- answer = chat_model.process(inputs)
176
- print(answer)
177
- ```
178
-
179
- We can obtain the following results:
180
- ```
181
- "The people in the image are playing baseball. One person is pitching a ball, another one is swinging a bat to hit it, and there's also an umpire present who appears to be watching the game closely."
182
-
183
- "The image depicts a baseball game in progress. A pitcher is throwing the ball, while another player is swinging his bat to hit it. An umpire can be seen observing the play closely."
184
- ```
185
 
 
18
 
19
  We combine the OmniLMM-12B and GPT-3.5 into a **real-time multimodal interactive assistant**. The assistant accepts video streams from the camera and speech streams from the microphone and emits speech output. While still primary, we find the model can **replicate some of the fun cases shown in the Gemini Demo video, without any video edition**.
20
 
21
+ ## Evaluation
22
  <table>
23
  <thead>
24
  <tr>
 
118
  ## Demo
119
  Click here to try out the Demo of [OmniLMM-12B](http://120.92.209.146:8081).
120
 
121
+ ## Usage
122
+ Looking at [github](https://github.com/OpenBMB/OmniLMM) for more detail about usage.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123