phi3 vision visual encoder

by the-future-dev - opened May 21, 2024

May 21, 2024

Was a paper published about this vision model?
Which visual encoder was used?

bdytx5

May 21, 2024

looks like a clip encoder.

May 21, 2024

•

We use CLIP-L, the paper will be released later today.

May 21, 2024

What is the resolution of the image input?

Microsoft org May 21, 2024

The resolution is dynamic based on the input image aspect ratio. The max resolution is 1344x1344.

paul91

May 22, 2024

We use CLIP-L, the paper will be released later today.

where is the parper, please show links

paul91

May 22, 2024

We use CLIP-L, the paper will be released later today.

Does the visual model freeze during training?

sayedM

May 25, 2024

We use CLIP-L, the paper will be released later today.

Are you going to release the paper and the fine-tuning code ?

haohoo

May 27, 2024

'img_processor': {'image_dim_out': 1024, 'model_name': 'openai/clip-vit-large-patch14-336', 'name': 'clip_vision_model', 'num_img_tokens': 144}

Please share the Paper URL here or model card.

Microsoft org Jun 4, 2024

The updated Phi-3 Technical Report is available at https://arxiv.org/pdf/2404.14219

nguyenbh changed discussion status to closed Jun 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment