much higher COCO fid_5k score than ADD paper

#31
by guangersu - opened

When I reproduced the ADD paper, I used the code of hunggingface example to generate the coco 5k picture to calculate the fid, I found that the fid calculated by sdxl_tubor was 30, which was higher than the 20 in the paper.
code is:

import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

if I want to reproduce the original paper, do you have any suggestions?Thank you very much.

Yes just keep pushing that way, seem a clean Commit and a Wise move, I will try and I'll let you know. Thanks you!

Hi, @ramonzero343 . I also found the model is preform worse than paper in COCO FID_5k. But SDv1.5(50step) is ~22 which means the codebase is right. The codebase we used is https://github.com/autonomousvision/stylegan-t. Will the commit affect the results? How long will update the huggingface model?

Best,
Peiqin

@ramonzero343 Thanks for your reply and help, I use the dataset from coco val 2017 with 5k picture and I randomly selected one of the five prompt to generate the pictures with sdxl-turbo. Thank you very much.

@ramonzero343 The number of generated images used to calculate COCO-FID_5k is 5k, and the number of prompts used to generate images is also 5k. Each prompt is randomly selected from the captions of every sample in the COCO2017 validation set.

The SDXL-turbo does have a higher fid score than SD-turbo. This has been reported in their paper. By the way, can you share the script for computing the fid score?

we do the experiments in repo(https://github.com/NVlabs/stylegan3/tree/main) @djdjdj666

Sign up or log in to comment