badayvedat/LLaVA · Different inference results

Oct 19, 2023

Hi, I deployped a CLI Inference based on llava-v1.5-13b on my own sever, but the inference result is much worse than your deployment, could you tell me the possible reason or the steps you build this deployment? Thank you so much!

badayvedat

Owner Oct 20, 2023

Hi @hhtian , with this limited knowledge it is not possible (at least for me :) to understand the exact problem. Can you share more information, like what are your inference results, why and in what way (any quantitative, qualitative comparison?) they are worse?

hhtian

Oct 20, 2023

@badayveda Thank you for your reply! We tried to use the llava model to findout what is the person holding in a image. When we use your deployment, the results are very good. However, the answer is different when we upload the picture in our service using the same prompt. In our service, llava model can't precisely recognize what the person is holding. We compared the code for gradio service and CLI service, and found that the difference for inference is how to load image to the model. In gradio service, the code is " image = Image.open(os.path.join(self.image_folder, image_file)).convert('RGB'), image_tensor = process_images([image], self.image_processor, self.model_config)[0]", in CLI service the code is "image = Image.open(image_file), image = expand2square(image, tuple(int(x * 255) for x i n self.image_processor.image_mean)), image_tensor = self.image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0]". After we changed the image loading methods, the output results were better, but it is still different from your service result. For example, when asking the model what is the person holding in a picture, for a same picture, your service says she is holding a watch, but my service says she is holding a clock. We still can't figure out what causes the difference.