Can you add / mirror the AutoProcessor here?

#11
by michaelfeil - opened

Would be great to have it all in one repo!

Jina AI org

Hey @michaelfeil thanks for reaching out! I am assuming you mean AutoProcessor.from_pretrained('jinaai/jina-clip-v1', trust_remote_code=True) is not working right? Can you try again? I added a small fix

Thanks - there might be an issue when using the AutoProcessor. Does this look familiar?

  File "/home/michael/infinity/libs/infinity_emb/infinity_emb/transformer/vision/torch_vision.py", line 85, in encode_pre
    preprocessed = self.processor(
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/transformers/models/clip/processing_clip.py", line 97, in __call__
    tokenizer_kwargs = {k: v for k, v in kwargs.items() if k not in self.image_processor._valid_processor_keys}
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/transformers/models/clip/processing_clip.py", line 97, in <dictcomp>
    tokenizer_kwargs = {k: v for k, v in kwargs.items() if k not in self.image_processor._valid_processor_keys}
AttributeError: 'JinaCLIPImageProcessor' object has no attribute '_valid_processor_keys'

The hope would be that this example works! (Which would be a big step for compatability with other clip ecosystem stuff)

from PIL import Image
import requests

from transformers import AutoProcessor, AutoModel

model = AutoModel.from_pretrained('jinaai/jina-clip-v1', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('jinaai/jina-clip-v1', trust_remote_code=True)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) 
Jina AI org

@michaelfeil can you try again?

@gmastrapas Great thank you! Almost working!

FYI: If I do fp16, I get unexpected behaviour - likley no casting in the AutoModel or .half() is not respected in the modelling code.
model = model.half()

  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/michael/.cache/huggingface/modules/transformers_modules/jinaai/jina-clip-implementation/b8455774f0b1d38fac760eb4d654241b1a8406a9/eva_model.py", line 761, in forward
    x = self.forward_features(x)
  File "/home/michael/.cache/huggingface/modules/transformers_modules/jinaai/jina-clip-implementation/b8455774f0b1d38fac760eb4d654241b1a8406a9/eva_model.py", line 718, in forward_features
    x = self.patch_embed(x)
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/michael/.cache/huggingface/modules/transformers_modules/jinaai/jina-clip-implementation/b8455774f0b1d38fac760eb4d654241b1a8406a9/eva_model.py", line 471, in forward
    x = self.proj(x).flatten(2).transpose(1, 2)
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
Jina AI org

@michaelfeil autocasting added, should be working now. Only on CUDA though cause slow_conv does not support half precision on CPU

Awesome! Great work @gmastrapas

michaelfeil changed discussion status to closed

Sign up or log in to comment