qiruichen1206@gmail.com commited on
Commit
595112c
1 Parent(s): 87f9654

Update README.md

Browse files
README.md CHANGED
@@ -3,4 +3,13 @@ datasets:
3
  - SurplusDeficit/MultiHop-EgoQA
4
  ---
5
 
6
- # Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
 
 
 
 
 
 
 
 
 
 
3
  - SurplusDeficit/MultiHop-EgoQA
4
  ---
5
 
6
+ # Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
7
+
8
+
9
+ ## GeLM Model
10
+
11
+ We propose a novel architecture, termed as <b><u>GeLM</u></b> for *MH-VidQA*, to leverage the world knowledge reasoning capabilities of multi-modal large language models (LLMs), while incorporating a grounding module to retrieve temporal evidence in the video with flexible grounding tokens.
12
+
13
+ <div align="center">
14
+ <img src="./assets/architecture_v3.jpeg" style="width: 80%;">
15
+ </div>
RTL-GeLM-7B/tokenizer.model CHANGED
Binary files a/RTL-GeLM-7B/tokenizer.model and b/RTL-GeLM-7B/tokenizer.model differ
 
assets/architecture_v3.jpeg ADDED