Spaces:

mukeshpandey2628
/

GT_VTR3_1

Runtime error

GT_VTR3_1 / debugging_setps.txt

Ubuntu

improved inference time

3bc69b8 4 months ago

2.13 kB

	initial inference time - 30-40 sec 😁

	1) lowered num_steps for diffusion model to 10 from 20 - inference time = 17-19 sec 👍

	2) moved onxx model from cpu compute to gpu - inference time = 12-14 sec cold start take more time 😀




	working

	1 preprocess images -
	first target human image preprocess with openpose and humanparse
	openpose - to get pose information of joints
	humanparse- to segment image with diffrent part like face, body , background that we can use to
	determine where to do diffusion using mask

	merging mask from humanparse on original human image that will we feed into diffusion model

	processing cloth image -

	with torch.no_grad():
	prompt_image = self.auto_processor(images=image_garm, return_tensors="pt").to('cuda')
	prompt_image = self.image_encoder(prompt_image.data['pixel_values']).image_embeds
	prompt_image = prompt_image.unsqueeze(1)
	if model_type == 'hd':
	prompt_embeds = self.text_encoder(self.tokenize_captions([""], 2).to('cuda'))[0]
	prompt_embeds[:, 1:] = prompt_image[:]
	elif model_type == 'dc':
	prompt_embeds = self.text_encoder(self.tokenize_captions([category], 3).to('cuda'))[0]
	prompt_embeds = torch.cat([prompt_embeds, prompt_image], dim=1)


	this will convert cloth image into image embedding and generate prompt embedding using category we provide



	GatedSelfAttentionDense: This class combines visual features and object features using self-attention.
	It's likely used to fuse information about the clothing items with the human body image.

	2 at last we feed both human masked image and
	cloth image embedding and prompt embedding concated - [image_embeds, prompt_embeds]
	into diffusion model then running inference the diffusion model -
	first it will convert image input as latent embedding using VAE ,
	then perform difussion with paramter we provided, samples, num_steps, noise, seed, etc
	after num of steps of diffusion we convert that output again in image spce using VAE
	and thats our output image