Spaces:
Running
on
A10G
Running
on
A10G
JingyeChen
commited on
Commit
β’
2ade25e
1
Parent(s):
aea387e
update
Browse files
app.py
CHANGED
@@ -42,7 +42,7 @@ os.system('ls')
|
|
42 |
|
43 |
#### import m1
|
44 |
from fastchat.model import load_model, get_conversation_template
|
45 |
-
m1_model_path = '
|
46 |
m1_model, m1_tokenizer = load_model(
|
47 |
m1_model_path,
|
48 |
'cuda',
|
@@ -356,7 +356,7 @@ with gr.Blocks() as demo:
|
|
356 |
We propose <b>TextDiffuser-2</b>, aiming at unleashing the power of language models for text rendering. Specifically, we <b>tame a language model into a layout planner</b> to transform user prompt into a layout using the caption-OCR pairs. The language model demonstrates flexibility and automation by inferring keywords from user prompts or incorporating user-specified keywords to determine their positions. Secondly, we <b>leverage the language model in the diffusion model as the layout encoder</b> to represent the position and content of text at the line level. This approach enables diffusion models to generate text images with broader diversity.
|
357 |
</h2>
|
358 |
<h2 style="text-align: left; font-weight: 450; font-size: 1rem; margin-top: 0.5rem; margin-bottom: 0.5rem">
|
359 |
-
π <b>Tips for using this demo</b>: <b>(1)</b> Please carefully read the disclaimer in the below. <b>(2)</b> The specification of keywords is optional. If provided, the language model will do its best to plan layouts using the given keywords. <b>(3)</b> If a template is given, the layout planner (M1) is not used. <b>(4)</b> Three operations, including redo, undo, and skip are provided. When using skip, only the left-top point of a keyword will be recorded, resulting in more diversity but sometimes decreasing the accuracy. <b>(5)</b> The layout planner can produce different layouts. You can
|
360 |
</h2>
|
361 |
|
362 |
<style>
|
|
|
42 |
|
43 |
#### import m1
|
44 |
from fastchat.model import load_model, get_conversation_template
|
45 |
+
m1_model_path = 'JingyeChen22/textdiffuser2_layout_planner'
|
46 |
m1_model, m1_tokenizer = load_model(
|
47 |
m1_model_path,
|
48 |
'cuda',
|
|
|
356 |
We propose <b>TextDiffuser-2</b>, aiming at unleashing the power of language models for text rendering. Specifically, we <b>tame a language model into a layout planner</b> to transform user prompt into a layout using the caption-OCR pairs. The language model demonstrates flexibility and automation by inferring keywords from user prompts or incorporating user-specified keywords to determine their positions. Secondly, we <b>leverage the language model in the diffusion model as the layout encoder</b> to represent the position and content of text at the line level. This approach enables diffusion models to generate text images with broader diversity.
|
357 |
</h2>
|
358 |
<h2 style="text-align: left; font-weight: 450; font-size: 1rem; margin-top: 0.5rem; margin-bottom: 0.5rem">
|
359 |
+
π <b>Tips for using this demo</b>: <b>(1)</b> Please carefully read the disclaimer in the below. <b>(2)</b> The specification of keywords is optional. If provided, the language model will do its best to plan layouts using the given keywords. <b>(3)</b> If a template is given, the layout planner (M1) is not used. <b>(4)</b> Three operations, including redo, undo, and skip are provided. When using skip, only the left-top point of a keyword will be recorded, resulting in more diversity but sometimes decreasing the accuracy. <b>(5)</b> The layout planner can produce different layouts. You can increase the temperature to enhance the diversity.
|
360 |
</h2>
|
361 |
|
362 |
<style>
|