--- language: - en tags: - reco - text-to-image - layout-to-image pipeline_tag: text-to-image widget: - text: "A box contains six donuts with varying types of glazes and toppings. <|endoftext|> <|startoftext|> chocolate donut. <|endoftext|> <|startoftext|> dark vanilla donut. <|endoftext|> <|startoftext|> donut with sprinkles. <|endoftext|> <|startoftext|> donut with powdered sugar. <|endoftext|> <|startoftext|> pink donut. <|endoftext|> <|startoftext|> brown donut. <|endoftext|>" --- # Diffusers 🧨 port of [ReCo: Region-Controlled Text-to-Image Generation (CVPR 2023)](https://arxiv.org/abs/2211.15518) - Original authors: Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang - Original github repo by authors: https://github.com/microsoft/ReCo - Converted to Diffusers: Jaemin Cho # LAION checkpoint - original pytorch lightning checkpoint: https://unitab.blob.core.windows.net/data/reco/reco_laion_1232.ckpt - original configuration yaml: https://github.com/microsoft/ReCo/blob/main/configs/reco/v1-finetune_laion.yaml # Example Usage ```python import torch from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained( "j-min/reco_sd14_laion", torch_dtype=torch.float16 ) pipe = pipe.to("cuda") prompt = "A box contains six donuts with varying types of glazes and toppings. <|endoftext|> <|startoftext|> chocolate donut. <|endoftext|> <|startoftext|> dark vanilla donut. <|endoftext|> <|startoftext|> donut with sprinkles. <|endoftext|> <|startoftext|> donut with powdered sugar. <|endoftext|> <|startoftext|> pink donut. <|endoftext|> <|startoftext|> brown donut. <|endoftext|>" generated_image = pipe( prompt, guidance_scale=4).images[0] generated_image ``` ## method to create ReCo prompts ```python def create_reco_prompt( caption: str = '', phrases=[], boxes=[], normalize_boxes=True, image_resolution=512, num_bins=1000, ): """ method to create ReCo prompt caption: global caption phrases: list of regional captions boxes: list of regional coordinates (unnormalized xyxy) """ SOS_token = '<|startoftext|>' EOS_token = '<|endoftext|>' box_captions_with_coords = [] box_captions_with_coords += [caption] box_captions_with_coords += [EOS_token] for phrase, box in zip(phrases, boxes): if normalize_boxes: box = [float(x) / image_resolution for x in box] # quantize into bins quant_x0 = int(round((box[0] * (num_bins - 1)))) quant_y0 = int(round((box[1] * (num_bins - 1)))) quant_x1 = int(round((box[2] * (num_bins - 1)))) quant_y1 = int(round((box[3] * (num_bins - 1)))) # ReCo format # Add SOS/EOS before/after regional captions box_captions_with_coords += [ f"", f"", f"", f"", SOS_token, phrase, EOS_token ] text = " ".join(box_captions_with_coords) return text caption = "a photo of bus and boat; boat is left to bus." phrases = ["a photo of a bus.", "a photo of a boat."] boxes = [[0.702, 0.404, 0.927, 0.601], [0.154, 0.383, 0.311, 0.487]] prompt = create_reco_prompt(caption, phrases, boxes, normalize_boxes=False) prompt >>> 'a photo of bus and boat; boat is left to bus. <|endoftext|> <|startoftext|> a photo of a bus. <|endoftext|> <|startoftext|> a photo of a boat. <|endoftext|>' caption = "A box contains six donuts with varying types of glazes and toppings." phrases = ["chocolate donut.", "dark vanilla donut.", "donut with sprinkles.", "donut with powdered sugar.", "pink donut.", "brown donut."] boxes = [[263.68, 294.912, 380.544, 392.832], [121.344, 265.216, 267.392, 401.92], [391.168, 294.912, 506.368, 381.952], [120.064, 143.872, 268.8, 270.336], [264.192, 132.928, 393.216, 263.68], [386.048, 148.48, 490.688, 259.584]] prompt = create_reco_prompt(caption, phrases, boxes) prompt >>> 'A box contains six donuts with varying types of glazes and toppings. <|endoftext|> <|startoftext|> chocolate donut. <|endoftext|> <|startoftext|> dark vanilla donut. <|endoftext|> <|startoftext|> donut with sprinkles. <|endoftext|> <|startoftext|> donut with powdered sugar. <|endoftext|> <|startoftext|> pink donut. <|endoftext|> <|startoftext|> brown donut. <|endoftext|>' ```