nlpconnect/vit-gpt2-image-captioning · How to get confidence level for the classification

Oct 31, 2022

I would like to ask if there is a way to get both the caption for the image and the confidence level in the predicted caption as a float from 0 to 1 (0 means completely uncertain, 1 means completely certain).

ankur310794

NLP Connect org Nov 2, 2022

Hi @Caridorc
Yes you can do it. There are flags in generate function i.e. return_dict_in_generate and output_scores, you have to enable that.

Try this

max_length = 16
num_beams = 4
gen_kwargs = {"max_length": max_length, 
              "num_beams": num_beams,
              "output_scores": True,
              "return_dict_in_generate": True}


def predict_step(image_paths):
  images = []
  for image_path in image_paths:
    i_image = Image.open(image_path)
    if i_image.mode != "RGB":
      i_image = i_image.convert(mode="RGB")

    images.append(i_image)

  pixel_values = feature_extractor(images=images, return_tensors="pt").pixel_values
  pixel_values = pixel_values.to(device)

  output_ids = model.generate(pixel_values, **gen_kwargs)
  probs = output_ids.sequences_scores
  preds = tokenizer.batch_decode(output_ids.sequences, skip_special_tokens=True)
  preds = [pred.strip() for pred in preds]
  return preds, probs

Please refer to these resources for more:

https://huggingface.co/blog/how-to-generate
https://huggingface.co/docs/transformers/v4.24.0/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate

ankur310794 changed discussion status to closed Nov 3, 2022