https://huggingface.co/microsoft/Florence-2-large with ONNX weights to be compatible with Transformers.js.

Usage (Transformers.js)

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

Example: Perform image captioning with onnx-community/Florence-2-large.

import {
    Florence2ForConditionalGeneration,
    AutoProcessor,
    AutoTokenizer,
    RawImage,
} from '@huggingface/transformers';

// Load model, processor, and tokenizer
const model_id = 'onnx-community/Florence-2-large';
const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16', // or 'fp32'
        vision_encoder: 'fp16', // or 'fp32'
        encoder_model: 'q4',
        decoder_model_merged: 'q4',
    },
});
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

// Load image and prepare vision inputs
const url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg';
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Specify task and prepare text inputs
const task = '<MORE_DETAILED_CAPTION>';
const prompts = processor.construct_prompts(task);
const text_inputs = tokenizer(prompts);

// Generate text
const generated_ids = await model.generate({
    ...text_inputs,
    ...vision_inputs,
    max_new_tokens: 256,
});

// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, { skip_special_tokens: false })[0];

// Post-process the generated text
const result = processor.post_process_generation(generated_text, task, image.size);
console.log(result);
// { '<MORE_DETAILED_CAPTION>': 'The image shows a vintage Volkswagen Beetle car parked on a cobblestone street in front of a yellow building with two wooden doors. The car is a bright turquoise color and has a classic design with a round body and a sloping roofline. It has two doors on either side of the car, one on the left side and one in the center, with a brown door on the right side. The doors are made of wood and have a rustic, weathered look. The building behind the car is painted in a light yellow color and appears to be old and dilapidated. The sky is blue and there are trees in the background. The image is taken from a low angle, looking up at the car and the building.' }

We also released an online demo, which you can try yourself: https://huggingface.co/spaces/Xenova/florence2-webgpu


Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named onnx).

Downloads last month
70
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-text-to-text models for transformers.js library.

Model tree for onnx-community/Florence-2-large

Quantized
(1)
this model