Text Generation
Transformers
Safetensors
GGUF
llava
remyx
Inference Endpoints
salma-remyx commited on
Commit
cc33b4e
1 Parent(s): 64b04d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -7,13 +7,13 @@ license: apache-2.0
7
 
8
  # Model Card for SpaceLLaVA
9
 
10
- **SpaceLLaVA** uses LoRA to fine-tune [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main) on a dataset designed with [VQASynth](https://github.com/remyxai/VQASynth/tree/main) to enhance spatial reasoning as in [SpatialVLM](https://spatial-vlm.github.io/)
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- This model uses data synthesis techniques and publically available models to reproduce the work described in SpatialVLM to enhance the spatial reasoning of multimodal models.
17
  With a pipeline of expert models, we can infer spatial relationships between objects in a scene to create VQA dataset for spatial reasoning.
18
 
19
 
@@ -32,7 +32,7 @@ With a pipeline of expert models, we can infer spatial relationships between obj
32
  Use this model to query spatial relationships between objects in a scene.
33
 
34
  ## Citation
35
-
36
  @article{chen2024spatialvlm,
37
  title = {SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities},
38
  author = {Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Driess, Danny and Florence, Pete and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei},
@@ -42,8 +42,9 @@ Use this model to query spatial relationships between objects in a scene.
42
  }
43
 
44
  @misc{liu2023llava,
45
- title={Visual Instruction Tuning},
46
  author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
47
  publisher={NeurIPS},
48
  year={2023},
49
  }
 
 
7
 
8
  # Model Card for SpaceLLaVA
9
 
10
+ **SpaceLLaVA** uses LoRA to fine-tune [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main) on a dataset designed with [VQASynth](https://github.com/remyxai/VQASynth/tree/main) to enhance spatial reasoning as in [SpatialVLM](https://spatial-vlm.github.io/)
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
+ This model uses data synthesis techniques and publically available models to reproduce the work described in SpatialVLM to enhance the spatial reasoning of multimodal models.
17
  With a pipeline of expert models, we can infer spatial relationships between objects in a scene to create VQA dataset for spatial reasoning.
18
 
19
 
 
32
  Use this model to query spatial relationships between objects in a scene.
33
 
34
  ## Citation
35
+ ```
36
  @article{chen2024spatialvlm,
37
  title = {SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities},
38
  author = {Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Driess, Danny and Florence, Pete and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei},
 
42
  }
43
 
44
  @misc{liu2023llava,
45
+ title={Visual Instruction Tuning},
46
  author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
47
  publisher={NeurIPS},
48
  year={2023},
49
  }
50
+ ```