sayakpaul HF staff commited on
Commit
a59a734
1 Parent(s): 7f31ab6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -11
README.md CHANGED
@@ -4,10 +4,26 @@ library_name: diffusers
4
 
5
  # SPRIGHT-T2I Model Card
6
 
7
- The SPRIGHT-T2I model is a text-to-image diffusion model with high spatial coherency. It was first introduced in [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://), authored by Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo,
8
- Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, and Yezhou Yang.
 
9
 
10
- SPRIGHT-T2I model was finetuned from stable diffusion v2.1 on a subset of the [SPRIGHT dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright), which contains images and spatially focused captions. Leveraging SPRIGHT, along with efficient training techniques, we achieve state-of-the art performance in generating spatially accurate images from text.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  The training code and more details available in [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I).
13
 
@@ -29,7 +45,7 @@ Use SPRIGHT-T2I with 🧨 [`diffusers`](https://huggingface.co/SPRIGHT-T2I/sprig
29
  Use the code below to run SPRIGHT-T2I seamlessly and effectively on [🤗's Diffusers library](https://github.com/huggingface/diffusers) .
30
 
31
  ```bash
32
- pip install diffusers transformers accelerate scipy safetensors
33
  ```
34
 
35
  Running the pipeline:
@@ -51,10 +67,15 @@ image = pipe(prompt).images[0]
51
  image.save("kitten_sittin_in_a_dish.png")
52
  ```
53
 
54
- <img src="kitten_sitting_in_a_dish.png" width="300" alt="img">
 
 
55
 
56
  Additional examples that emphasize spatial coherence:
57
- <img src="result_images/visor.png" width="1000" alt="img">
 
 
 
58
 
59
  ## Bias and Limitations
60
 
@@ -103,16 +124,14 @@ Our key findings are:
103
  - Improve on all aspects of the VISOR score while improving the ZS-FID and CMMD score on COCO-30K images by 23.74% and 51.69%, respectively
104
  - Enhance the ability to generate 1 and 2 objects, along with generating the correct number of objects, as indicated by evaluation on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
105
 
106
- ### Model Sources
107
 
 
108
  - **Repository:** [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I)
109
  - **Paper:** [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://)
110
  - **Demo:** [SPRIGHT-T2I on Spaces](https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I)
 
111
 
112
  ## Citation
113
 
114
  Coming soon
115
-
116
-
117
-
118
-
 
4
 
5
  # SPRIGHT-T2I Model Card
6
 
7
+ The SPRIGHT-T2I model is a text-to-image diffusion model with high spatial coherency. It was first introduced in [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://),
8
+ authored by Agneet Chatterjee<sup>\*</sup>, Gabriela Ben Melech Stan<sup>*</sup>, Estelle Aflalo, Sayak Paul, Dhruba Ghosh,
9
+ Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, and Yezhou Yang.
10
 
11
+ _(<sup>*</sup> denotes equal contributions)_
12
+
13
+ SPRIGHT-T2I model was finetuned from [Stable Diffusion v2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) on a subset
14
+ of the [SPRIGHT dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright), which contains images and spatially focused
15
+ captions. Leveraging SPRIGHT, along with efficient training techniques, we achieve state-of-the art
16
+ performance in generating spatially accurate images from text.
17
+
18
+ ## Table of contents
19
+
20
+ * [Model details](#model-details)
21
+ * [Usage](#usage)
22
+ * [Bias and Limitations](#bias-and-limitations)
23
+ * [Training](#training)
24
+ * [Evaluation](#evaluation)
25
+ * [Model Resources](#model-resources)
26
+ * [Citation](#citation)
27
 
28
  The training code and more details available in [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I).
29
 
 
45
  Use the code below to run SPRIGHT-T2I seamlessly and effectively on [🤗's Diffusers library](https://github.com/huggingface/diffusers) .
46
 
47
  ```bash
48
+ pip install diffusers transformers accelerate -U
49
  ```
50
 
51
  Running the pipeline:
 
67
  image.save("kitten_sittin_in_a_dish.png")
68
  ```
69
 
70
+ <div align="center">
71
+ <img src="kitten_sitting_in_a_dish.png" width="300" alt="img">
72
+ </div><be>
73
 
74
  Additional examples that emphasize spatial coherence:
75
+
76
+ <div align="center">
77
+ <img src="result_images/visor.png" width="1000" alt="img">
78
+ </div><br>
79
 
80
  ## Bias and Limitations
81
 
 
124
  - Improve on all aspects of the VISOR score while improving the ZS-FID and CMMD score on COCO-30K images by 23.74% and 51.69%, respectively
125
  - Enhance the ability to generate 1 and 2 objects, along with generating the correct number of objects, as indicated by evaluation on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
126
 
127
+ ### Model Resources
128
 
129
+ - **Dataset**: [SPRIGHT Dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright)
130
  - **Repository:** [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I)
131
  - **Paper:** [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://)
132
  - **Demo:** [SPRIGHT-T2I on Spaces](https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I)
133
+ - **Project Website**: [SPRIGHT Website](https://spright.github.io/)
134
 
135
  ## Citation
136
 
137
  Coming soon