Mountchicken commited on
Commit
8c98deb
·
verified ·
1 Parent(s): 692ce93

Upload 6 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/gradio.jpg filter=lfs diff=lfs merge=lfs -text
37
+ assets/humanref.jpg filter=lfs diff=lfs merge=lfs -text
38
+ assets/model_arch.jpg filter=lfs diff=lfs merge=lfs -text
39
+ assets/teaser.jpg filter=lfs diff=lfs merge=lfs -text
40
+ assets/video_teaser.jpg filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,406 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ <div align=center>
3
+ <img src="assets/teaser.jpg" width=800 >
4
+ </div>
5
+
6
+
7
+
8
+ <div align=center>
9
+
10
+ ![Static Badge](https://img.shields.io/badge/Rex-Seek-Red
11
+ ) [![arXiv preprint](https://img.shields.io/badge/arxiv_2411.18363-blue%253Flog%253Darxiv
12
+ )](https://arxiv.org/abs/2503.08507) [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FIDEA-Research%2FRexSeek&count_bg=%233FAEF1&title_bg=%23555555&icon=iconify.svg&icon_color=%23E7E7E7&title=Bros&edge_flat=false)](https://hits.seeyoufarm.com)[![Homepage](https://img.shields.io/badge/homepage-visit-blue)](https://deepdataspace.com/blog/dino-xseek) [![Static Badge](https://img.shields.io/badge/Try_Demo!-blue?logo=chainguard&logoColor=green)](https://cloud.deepdataspace.com/playground/dino-x)
13
+
14
+ </div>
15
+
16
+ ----
17
+
18
+ # Contents
19
+ - [Contents](#contents)
20
+ - [1. Introduction 📚](#1-introduction-)
21
+ - [2. Installation 🛠️](#2-installation-️)
22
+ - [2.1 Download Pre-trained Models](#21-download-pre-trained-models)
23
+ - [2.2 Verify Installation](#22-verify-installation)
24
+ - [3. Usage 🚀](#3-usage-)
25
+ - [3.1 Model Architecture](#31-model-architecture)
26
+ - [3.2 Combine RexSeek with GroundingDINO](#32-combine-rexseek-with-groundingdino)
27
+ - [3.2.1 Install GroundingDINO](#321-install-groundingdino)
28
+ - [3.2.2 Run the Demo](#322-run-the-demo)
29
+ - [3.3 Combine RexSeek with GroundingDINO and Spacy](#33-combine-rexseek-with-groundingdino-and-spacy)
30
+ - [3.3.1 Install Dependencies](#331-install-dependencies)
31
+ - [3.3.2 Run the Demo](#332-run-the-demo)
32
+ - [3.4 Combine RexSeek with GroundingDINO, Spacy and SAM](#34-combine-rexseek-with-groundingdino-spacy-and-sam)
33
+ - [3.4.1 Install Dependencies](#341-install-dependencies)
34
+ - [3.4.2 Run the Demo](#342-run-the-demo)
35
+ - [4. Gradio Demos 🎨](#4-gradio-demos-)
36
+ - [4.1 Gradio Demo for RexSeek + GroundingDINO + SAM](#41-gradio-demo-for-rexseek--groundingdino--sam)
37
+ - [5. HumanRef Benchmark](#5-humanref-benchmark)
38
+ - [5.1 Download](#51-download)
39
+ - [5.2 Visualization](#52-visualization)
40
+ - [5.3 Evaluation](#53-evaluation)
41
+ - [5.3.1 Metrics](#531-metrics)
42
+ - [5.3.2 Evaluation Script](#532-evaluation-script)
43
+ - [5.3.3 Evaluate RexSeek](#533-evaluate-rexseek)
44
+ - [6. LICENSE](#6-license)
45
+ - [BibTeX 📚](#bibtex-)
46
+
47
+ ----
48
+
49
+ # 1. Introduction 📚
50
+ RexSeek is a Multimodal Large Language Model (MLLM) designed to detect people or objects in images based on natural language descriptions. Unlike traditional referring models that focus on single-instance detection, RexSeek excels at multi-instance referring tasks - identifying multiple people or objects that match a given description.
51
+
52
+ ### Key Features
53
+ - **Multi-Instance Detection**: Can identify multiple matching instances in a single image
54
+ - **Robust Perception**: Powered by state-of-the-art person detection models
55
+ - **Strong Language Understanding**: Leverages advanced LLM capabilities for complex description comprehension
56
+
57
+ ### The HumanRef Benchmark
58
+ We aslo introduce HumanRef Benchmark, a comprehensive benchmark for human-centric referring tasks containing:
59
+ - 6000 referring expressions
60
+ - Average of 2.2 instances per expression
61
+ - Covers 6 key aspects of human referring:
62
+ - Attributes (gender, age, clothing, etc.)
63
+ - Position (spatial relationships)
64
+ - Interaction (human-to-human, human-to-object)
65
+ - Reasoning (multi-step inference)
66
+ - Celebrity Recognition
67
+ - Rejection (hallucination detection)
68
+
69
+
70
+ [![Video Name](assets/video_teaser.jpg)](https://github.com/user-attachments/assets/e77ffd20-a26b-418d-8ae8-48565bfefdc7)
71
+
72
+ ----
73
+
74
+ # 2. Installation 🛠️
75
+ ```bash
76
+ conda install -n rexseek python=3.9
77
+ pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu121
78
+ pip install -v -e .
79
+ ```
80
+
81
+ ## 2.1 Download Pre-trained Models
82
+ We provide model checkpoints for ***RexSeek-3B***. You can download the pre-trained models from the following links:
83
+ - [ChatRex-3B Checkpoint](https://huggingface.co/IDEA-Research/RexSeek-3B)
84
+
85
+ Or you can also using the following command to download the pre-trained models:
86
+ ```bash
87
+ # Download ChatRex checkpoint from Hugging Face
88
+ git lfs install
89
+ git clone https://huggingface.co/IDEA-Research/RexSeek-3B IDEA-Research/RexSeek-3B
90
+ ```
91
+
92
+ ## 2.2 Verify Installation
93
+ To verify the installation, run the following command:
94
+ ```bash
95
+ python tests/test_local_load.py
96
+ ```
97
+
98
+ If the installation is successful, you will get a visualization image in `tests/images` folder.
99
+
100
+ # 3. Usage 🚀
101
+ ## 3.1 Model Architecture
102
+ <div align=center>
103
+ <img src="assets/model_arch.jpg" width=600 >
104
+ </div>
105
+
106
+ **TL;DR**: ***RexSeek needs model to propose object boxes first, then use the LLM to detect the objects.***
107
+
108
+ RexSeek consists of three key components:
109
+ 1. **Vision Encoders**: Dual-resolution feature extraction (CLIP + ConvNeXt)
110
+ 2. **Person Detector**: DINO-X for generating high-quality object proposals
111
+ 3. **Language Model**: Qwen2.5 for understanding complex referring expressions
112
+
113
+ - **Inputs**:
114
+ - Image: The source image containing people/objects
115
+ - Text: Natural language description of target objects
116
+ - Boxes: Object proposals from DINO-X detector (can be replaced with custom boxes)
117
+
118
+ - **Outputs**:
119
+ - Object indices corresponding to the referring expression in format:
120
+ ```
121
+ <ground>referring text</ground><objects><obj1><obj2>...</objects>
122
+ ```
123
+
124
+ ## 3.2 Combine RexSeek with GroundingDINO
125
+ In this example, we will use GroundingDINO to generate object proposals, and then use RexSeek to detect the objects.
126
+
127
+ ### 3.2.1 Install GroundingDINO
128
+ ```bash
129
+ cd demos/
130
+ git clone https://github.com/IDEA-Research/GroundingDINO.git
131
+ cd GroundingDINO
132
+ pip install -v -e .
133
+ mkdir weights
134
+ wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth -P weights
135
+ cd ../../../
136
+ ```
137
+
138
+ ### 3.2.2 Run the Demo
139
+ ```bash
140
+ python demos/rexseek_grounding_dino.py \
141
+ --image demos/demo_images/demo1.jpg \
142
+ --output demos/demo_images/demo1_result.jpg \
143
+ --referring "person that is giving a proposal" \
144
+ --objects "person" \
145
+ --text-threshold 0.25 \
146
+ --box-threshold 0.25
147
+ ```
148
+
149
+ ## 3.3 Combine RexSeek with GroundingDINO and Spacy
150
+ In previous example, we need to explicitly specify object categories (like "person") for GroundingDINO to detect. However, we can make this process more automatic by using Spacy to extract nouns from the question as detection targets.
151
+
152
+ ### 3.3.1 Install Dependencies
153
+ ```bash
154
+ pip install spacy
155
+ python -m spacy download en_core_web_sm
156
+ ```
157
+
158
+ ### 3.3.2 Run the Demo
159
+ ```bash
160
+ python demos/rexseek_grounding_dino_spacy.py \
161
+ --image demos/demo_images/demo1.jpg \
162
+ --output demos/demo_images/demo1_result.jpg \
163
+ --referring "person that is giving a proposal" \
164
+ --text-threshold 0.25 \
165
+ --box-threshold 0.25
166
+ ```
167
+
168
+ In this enhanced version:
169
+ - No need to specify `--objects` parameter
170
+ - Spacy automatically extracts nouns ("people", "shirts", "dogs", "park") from the question
171
+ - GroundingDINO uses these extracted nouns as detection targets
172
+ - More flexible and natural interaction through questions
173
+
174
+
175
+ ## 3.4 Combine RexSeek with GroundingDINO, Spacy and SAM
176
+ In this example, we will use GroundingDINO to generate object proposals, then use Spacy to extract nouns from the question as detection targets, and finally use SAM to segment the objects.
177
+
178
+ ### 3.4.1 Install Dependencies
179
+ ```bash
180
+ cd demos/
181
+ git clone https://github.com/IDEA-Research/SAM.git
182
+ cd SAM
183
+ pip install -v -e .
184
+ mkdir weights
185
+ wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -P weights
186
+ cd ../../../
187
+ ```
188
+
189
+ ### 3.4.2 Run the Demo
190
+ ```bash
191
+ python demos/rexseek_grounding_dino_spacy_sam.py \
192
+ --image demos/demo_images/demo1.jpg \
193
+ --output demos/demo_images/demo1_result.jpg \
194
+ --referring "person that is giving a proposal" \
195
+ --text-threshold 0.25 \
196
+ --box-threshold 0.25
197
+ ```
198
+
199
+ # 4. Gradio Demos 🎨
200
+
201
+ ## 4.1 Gradio Demo for RexSeek + GroundingDINO + SAM
202
+ We provide a gradio demo for RexSeek + GroundingDINO + SAM. You can run the following command to start the gradio demo:
203
+ ```bash
204
+ python demos/gradio_demo.py \
205
+ --rexseek-path "IDEA-Research/RexSeek-3B" \
206
+ --gdino-config "demos/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py" \
207
+ --gdino-weights "demos/GroundingDINO/weights/groundingdino_swint_ogc.pth" \
208
+ --sam-weights "demos/segment-anything/weights/sam_vit_h_4b8939.pth"
209
+ ```
210
+
211
+ <div align=center>
212
+ <img src="assets/gradio.jpg" width=600 >
213
+ </div>
214
+
215
+ # 5. HumanRef Benchmark
216
+
217
+ <div align=center>
218
+ <img src="assets/humanref.jpg" width=600 >
219
+ </div>
220
+
221
+ HumanRef is a large-scale human-centric referring expression dataset designed for multi-instance human referring in natural scenes. Unlike traditional referring datasets that focus on one-to-one object referring, HumanRef supports referring to multiple individuals simultaneously through natural language descriptions.
222
+
223
+ Key features of HumanRef include:
224
+
225
+ - **Multi-Instance Referring**: A single referring expression can correspond to multiple individuals, better reflecting real-world scenarios
226
+ - **Diverse Referring Types**: Covers 6 major types of referring expressions:
227
+ - Attribute-based (e.g., gender, age, clothing)
228
+ - Position-based (relative positions between humans or with environment)
229
+ - Interaction-based (human-human or human-environment interactions)
230
+ - Reasoning-based (complex logical combinations)
231
+ - Celebrity Recognition
232
+ - Rejection Cases (non-existent references)
233
+ - **High-Quality Data**:
234
+ - 34,806 high-resolution images (>1000×1000 pixels)
235
+ - 103,028 referring expressions in training set
236
+ - 6,000 carefully curated expressions in benchmark set
237
+ - Average 8.6 persons per image
238
+ - Average 2.2 target boxes per referring expression
239
+
240
+ The dataset aims to advance research in human-centric visual understanding and referring expression comprehension in complex, multi-person scenarios.
241
+
242
+ ## 5.1 Download
243
+ You can download the HumanRef Benchmark at [https://huggingface.co/datasets/IDEA-Research/HumanRef](https://huggingface.co/datasets/IDEA-Research/HumanRef).
244
+
245
+ ## 5.2 Visualization
246
+ HumanRef Benchmark contains 6 domains, each domain may have multiple sub-domains.
247
+
248
+ | Domain | Subdomain | Num Referrings |
249
+ |--------|-----------|--------|
250
+ | attribute | 1000_attribute_retranslated_with_mask | 1000 |
251
+ | position | 500_inner_position_data_with_mask | 500 |
252
+ | position | 500_outer_position_data_with_mask | 500 |
253
+ | celebrity | 1000_celebrity_data_with_mask | 1000 |
254
+ | interaction | 500_inner_interaction_data_with_mask | 500 |
255
+ | interaction | 500_outer_interaction_data_with_mask | 500 |
256
+ | reasoning | 229_outer_position_two_stage_with_mask | 229 |
257
+ | reasoning | 271_positive_then_negative_reasoning_with_mask | 271 |
258
+ | reasoning | 500_inner_position_two_stage_with_mask | 500 |
259
+ | rejection | 1000_rejection_referring_with_mask | 1000 |
260
+
261
+ To visualize the dataset, you can run the following command:
262
+
263
+ ```bash
264
+ python rexseek/tools/visualize_humanref.py \
265
+ --anno_path "IDEA-Research/HumanRef/annotations.jsonl" \
266
+ --image_root_dir "IDEA-Research/HumanRef/images" \
267
+ --domain_anme "attribute" \ # attribute, position, interaction, reasoning, celebrity, rejection
268
+ --sub_domain_anme "1000_attribute_retranslated_with_mask" \ # 1000_attribute_retranslated_with_mask, 500_inner_position_data_with_mask, 500_outer_position_data_with_mask, 1000_celebrity_data_with_mask, 500_inner_interaction_data_with_mask, 500_outer_interaction_data_with_mask, 229_outer_position_two_stage_with_mask, 271_positive_then_negative_reasoning_with_mask, 500_inner_position_two_stage_with_mask, 1000_rejection_referring_with_mask
269
+ --vis_path "IDEA-Research/HumanRef/visualize" \
270
+ --num_images 50 \
271
+ --vis_mask True # True, False
272
+ ```
273
+
274
+ ## 5.3 Evaluation
275
+ ### 5.3.1 Metrics
276
+
277
+ We evaluate the referring task using three main metrics: Precision, Recall, and DensityF1 Score.
278
+
279
+ #### Basic Metrics
280
+ - **Precision & Recall**: For each referring expression, a predicted bounding box is considered correct if its IoU with any ground truth box exceeds a threshold. Following COCO evaluation protocol, we report average performance across IoU thresholds from 0.5 to 0.95 in steps of 0.05.
281
+
282
+ - **Point-based Evaluation**: For models that only output points (e.g., Molmo), a prediction is considered correct if the predicted point falls within the mask of the corresponding instance. Note that this is less strict than IoU-based metrics.
283
+
284
+ - **Rejection Accuracy**: For the rejection subset, we calculate:
285
+ ```
286
+ Rejection Accuracy = Number of correctly rejected expressions / Total number of expressions
287
+ ```
288
+ where a correct rejection means the model predicts no boxes for a non-existent reference.
289
+
290
+ #### DensityF1 Score
291
+ To penalize over-detection (predicting too many boxes), we introduce the DensityF1 Score:
292
+
293
+ ```
294
+ DensityF1 = (1/N) * Σ [2 * (Precision_i * Recall_i)/(Precision_i + Recall_i) * D_i]
295
+ ```
296
+
297
+ where D_i is the density penalty factor:
298
+
299
+ ```
300
+ D_i = min(1.0, GT_Count_i / Predicted_Count_i)
301
+ ```
302
+
303
+ where:
304
+ - N is the number of referring expressions
305
+ - GT_Count_i is the total number of persons in image i
306
+ - Predicted_Count_i is the number of predicted boxes for referring expression i
307
+
308
+ This penalty factor reduces the score when models predict significantly more boxes than the actual number of people in the image, discouraging over-detection strategies.
309
+
310
+ ### 5.3.2 Evaluation Script
311
+ #### Prediction Format
312
+ Before running the evaluation, you need to prepare your model's predictions in the correct format. Each prediction should be a JSON line in a JSONL file with the following structure:
313
+
314
+ ```json
315
+ {
316
+ "id": "image_id",
317
+ "extracted_predictions": [[x1, y1, x2, y2], [x1, y1, x2, y2], ...]
318
+ }
319
+ ```
320
+
321
+ Where:
322
+ - id: The image identifier matching the ground truth data
323
+ - extracted_predictions: A list of bounding boxes in [x1, y1, x2, y2] format or points in [x, y] format
324
+
325
+ For rejection cases (where no humans should be detected), you should either:
326
+ - Include an empty list: "extracted_predictions": []
327
+ - Include a list with an empty box: "extracted_predictions": [[]]
328
+
329
+ #### Running the Evaluation
330
+ You can run the evaluation script using the following command:
331
+ ```bash
332
+ python rexseek/metric/recall_precision_densityf1.py \
333
+ --gt_path IDEA-Research/HumanRef/annotations.jsonl \
334
+ --pred_path path/to/your/predictions.jsonl \
335
+ --pred_names "Your Model Name" \
336
+ --dump_path IDEA-Research/HumanRef/evaluation_results/your_model_results
337
+ ```
338
+
339
+ Parameters:
340
+ - --gt_path: Path to the ground truth annotations file
341
+ - --pred_path: Path to your prediction file(s). You can provide multiple paths to compare different models
342
+ - --pred_names: Names for your models (for display in the results)
343
+ - --dump_path: Directory to save the evaluation results in markdown and JSON formats
344
+
345
+ Evaluating Multiple Models:
346
+
347
+ To compare multiple models, provide multiple prediction files:
348
+
349
+ ```bash
350
+ python rexseek/metric/recall_precision_densityf1.py \
351
+ --gt_path IDEA-Research/HumanRef/annotations.jsonl \
352
+ --pred_path model1_results.jsonl model2_results.jsonl model3_results.jsonl \
353
+ --pred_names "Model 1" "Model 2" "Model 3" \
354
+ --dump_path IDEA-Research/HumanRef/evaluation_results/comparison
355
+ ```
356
+
357
+ #### Programmatic Usage
358
+ ```python
359
+ from rexseek.metric.recall_precision_densityf1 import recall_precision_densityf1
360
+
361
+ recall_precision_densityf1(
362
+ gt_path="IDEA-Research/HumanRef/annotations.jsonl",
363
+ pred_path=["path/to/your/predictions.jsonl"],
364
+ dump_path="IDEA-Research/HumanRef/evaluation_results/your_model_results"
365
+ )
366
+ ```
367
+
368
+ ### 5.3.3 Evaluate RexSeek
369
+
370
+ First we need to run the following command to generate the predictions:
371
+ ```bash
372
+ python rexseek/evaluation/evaluate_rexseek.py \
373
+ --model_path IDEA-Research/RexSeek-3B \
374
+ --image_folder IDEA-Research/HumanRef/images \
375
+ --question_file IDEA-Research/HumanRef/annotations.jsonl \
376
+ --answers_file IDEA-Research/HumanRef/evaluation_results/eval_rexseek/RexSeek-3B_results.jsonl \
377
+ ```
378
+
379
+ Then we can run the following command to evaluate the RexSeek model:
380
+ ```bash
381
+ python rexseek/metric/recall_precision_densityf1.py \
382
+ --gt_path IDEA-Research/HumanRef/annotations.jsonl \
383
+ --pred_path IDEA-Research/HumanRef/evaluation_results/eval_rexseek/RexSeek-3B_results.jsonl\
384
+ --pred_names "RexSeek-3B" \
385
+ --dump_path IDEA-Research/HumanRef/evaluation_results/comparison
386
+ ```
387
+
388
+ # 6. LICENSE
389
+
390
+ ChatRex is licensed under the IDEA License 1.0, Copyright (c) IDEA. All Rights Reserved. Note that this project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses including but not limited to the:
391
+ - [OpenAI Terms of Use](https://openai.com/policies/terms-of-use) for the dataset.
392
+ - For the LLM used in this project, the model is [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), which is licensed under [Qwen RESEARCH LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSE).
393
+ - For the high resolution vision encoder, we are using [laion/CLIP-convnext_large_d.laion2B-s26B-b102K-augreg](https://huggingface.co/laion/CLIP-convnext_large_d.laion2B-s26B-b102K-augreg) which is licensed under [MIT LICENSE](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md).
394
+ - For the low resolution vision encoder, we are using [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) which is licensed under [MIT LICENSE](https://github.com/openai/CLIP/blob/main/LICENSE)
395
+ # BibTeX 📚
396
+ ```
397
+ @misc{jiang2025referringperson,
398
+ title={Referring to Any Person},
399
+ author={Qing Jiang and Lin Wu and Zhaoyang Zeng and Tianhe Ren and Yuda Xiong and Yihao Chen and Qin Liu and Lei Zhang},
400
+ year={2025},
401
+ eprint={2503.08507},
402
+ archivePrefix={arXiv},
403
+ primaryClass={cs.CV},
404
+ url={https://arxiv.org/abs/2503.08507},
405
+ }
406
+ ```
assets/gradio.jpg ADDED

Git LFS Details

  • SHA256: e99d9fbf5f04d0b6e2a38b87a62e1838711f784c73f8d5313c01b93aac285ed8
  • Pointer size: 131 Bytes
  • Size of remote file: 906 kB
assets/humanref.jpg ADDED

Git LFS Details

  • SHA256: 408ffc7cfdb85b0b81c4f3995c7656696c2ced5b63a01d779a94f6c733f10844
  • Pointer size: 131 Bytes
  • Size of remote file: 743 kB
assets/model_arch.jpg ADDED

Git LFS Details

  • SHA256: 81b4ec4e1a43aee2e697a07d0c373e70612aabbc187e8fed348ce04dd3bf5cf4
  • Pointer size: 131 Bytes
  • Size of remote file: 294 kB
assets/teaser.jpg ADDED

Git LFS Details

  • SHA256: 8391769c0183d03398f63077bf053cdd8a800e5330e588b45d732b02dba8db0a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.29 MB
assets/video_teaser.jpg ADDED

Git LFS Details

  • SHA256: 36043be61edfe1a4b5acabc4cff67aa6aea1c2bfd932536d188087112e09e01a
  • Pointer size: 131 Bytes
  • Size of remote file: 464 kB