Spaces:
Sleeping
Sleeping
JingyeChen22
commited on
Commit
•
a0ee6a6
1
Parent(s):
215a958
Update app.py
Browse files
app.py
CHANGED
@@ -449,6 +449,7 @@ with gr.Blocks() as demo:
|
|
449 |
[<a href="https://arxiv.org/abs/2311.16465" style="color:blue;">arXiv</a>]
|
450 |
[<a href="https://github.com/microsoft/unilm/tree/master/textdiffuser-2" style="color:blue;">Code</a>]
|
451 |
[<a href="https://jingyechen.github.io/textdiffuser2/" style="color:blue;">Project Page</a>]
|
|
|
452 |
</h3>
|
453 |
<h2 style="text-align: left; font-weight: 450; font-size: 1rem; margin-top: 0.5rem; margin-bottom: 0.5rem">
|
454 |
We propose <b>TextDiffuser-2</b>, aiming at unleashing the power of language models for text rendering. Specifically, we <b>tame a language model into a layout planner</b> to transform user prompt into a layout using the caption-OCR pairs. The language model demonstrates flexibility and automation by inferring keywords from user prompts or incorporating user-specified keywords to determine their positions. Secondly, we <b>leverage the language model in the diffusion model as the layout encoder</b> to represent the position and content of text at the line level. This approach enables diffusion models to generate text images with broader diversity.
|
|
|
449 |
[<a href="https://arxiv.org/abs/2311.16465" style="color:blue;">arXiv</a>]
|
450 |
[<a href="https://github.com/microsoft/unilm/tree/master/textdiffuser-2" style="color:blue;">Code</a>]
|
451 |
[<a href="https://jingyechen.github.io/textdiffuser2/" style="color:blue;">Project Page</a>]
|
452 |
+
[<a href="https://discord.gg/q7eHPupu" style="color:purple;">Discord</a>]
|
453 |
</h3>
|
454 |
<h2 style="text-align: left; font-weight: 450; font-size: 1rem; margin-top: 0.5rem; margin-bottom: 0.5rem">
|
455 |
We propose <b>TextDiffuser-2</b>, aiming at unleashing the power of language models for text rendering. Specifically, we <b>tame a language model into a layout planner</b> to transform user prompt into a layout using the caption-OCR pairs. The language model demonstrates flexibility and automation by inferring keywords from user prompts or incorporating user-specified keywords to determine their positions. Secondly, we <b>leverage the language model in the diffusion model as the layout encoder</b> to represent the position and content of text at the line level. This approach enables diffusion models to generate text images with broader diversity.
|