import gradio import torch import numpy from PIL import Image from torchvision import transforms #from torchvision import transforms from diffusers import StableDiffusionInpaintPipeline #from diffusers import StableDiffusionUpscalePipeline #from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation from diffusers import DPMSolverMultistepScheduler deviceStr = "cuda" if torch.cuda.is_available() else "cpu" device = torch.device(deviceStr) if deviceStr == "cuda": pipeline = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting", revision="fp16", torch_dtype=torch.float16, safety_checker=lambda images, **kwargs: (images, False)) pipeline.to(device) pipeline.enable_xformers_memory_efficient_attention() else: pipeline = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting", safety_checker=lambda images, **kwargs: (images, False)) #superresolutionPipe = StableDiffusionUpscalePipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler") #pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) #generator = torch.Generator(device).manual_seed(seed) latents = torch.randn((1, 4, 64, 64), device=device) schedulers = [ "DDIMScheduler", "LMSDiscreteScheduler", "PNDMScheduler" ] latentNoiseInputs = [ "Uniform", "Low Discrepency Sequence" ] imageSize = (512, 512, 3) imageSize2 = (512, 512) #lastImage = Image.new(mode="RGB", size=(imageSize[0], imageSize[1])) def diffuse(prompt, negativePrompt, inputImage, mask, guidanceScale, numInferenceSteps, seed, noiseScheduler, latentNoise): #width = inputImage.size[1] #height = 512 #print(inputImage.size) #image = numpy.resize(inputImage, imageSize) #pilImage.thumbnail(imageSize2) #transforms.Resize(imageSize2)(inputImage) #pilImage = Image.fromarray(inputImage) #pilImage.resize(imageSize2) #imageArray = numpy.asarray(pilImage) #inputImage = torch.nn.functional.interpolate(inputImage, size=imageSize) if mask is None: return inputImage generator = torch.Generator(device).manual_seed(seed) newImage = pipeline(prompt=prompt, negative_prompt=negativePrompt, image=inputImage, mask_image=mask, guidance_scale=guidanceScale, num_inference_steps=numInferenceSteps, generator=generator).images[0] return newImage prompt = gradio.Textbox(label="Prompt", placeholder="A person in a room", lines=3) negativePrompt = gradio.Textbox(label="Negative Prompt", placeholder="Text", lines=3) #inputImage = gradio.Image(label="Input Image", type="pil") inputImage = gradio.Image(label="Input Feed", source="webcam", shape=[512,512], streaming=True) mask = gradio.Image(label="Mask", type="pil") outputImage = gradio.Image(label="Extrapolated Field of View") guidanceScale = gradio.Slider(label="Guidance Scale", maximum=1, value=0.75) numInferenceSteps = gradio.Slider(label="Number of Inference Steps", maximum=100, value=25) seed = gradio.Slider(label="Generator Seed", maximum=1000, value=512) noiseScheduler = gradio.Dropdown(schedulers, label="Noise Scheduler", value="DDIMScheduler") latentNoise = gradio.Dropdown(latentNoiseInputs, label="Latent Noise", value="Iniform") inputs=[prompt, negativePrompt, inputImage, mask, guidanceScale, numInferenceSteps, seed, noiseScheduler, latentNoise] ux = gradio.Interface(fn=diffuse, title="View Diffusion", inputs=inputs, outputs=outputImage, live=True) ux.launch()