--- library_name: transformers license: mit datasets: - pierreguillou/DocLayNet-small language: - en pipeline_tag: image-text-to-text --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [Mit Patel] - **Shared by [optional]:** [Mit Patel] - **Finetuned from model [optional]:** https://huggingface.co/microsoft/Florence-2-base-ft ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Inference Procedure ```python !pip install -qU transformers !pip install -qU accelerate bitsandbytes einops flash_attn timm !pip install -q datasets from PIL import Image import requests import torch from transformers import AutoProcessor, AutoModelForVision2Seq, BitsAndBytesConfig, TrainingArguments, AutoModelForCausalLM import requests import re from transformers import AutoConfig, AutoProcessor, AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True,) processor = AutoProcessor.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True,) model = AutoModelForCausalLM.from_pretrained("Mit1208/Florence-2-DocLayNet", trust_remote_code=True, config = base_model.config) def run_example(task_prompt, image, text_input=None): if text_input is None: prompt = task_prompt else: prompt = task_prompt + text_input print(prompt) inputs = processor(text=prompt, images=image, return_tensors="pt").to(device) generated_ids = model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, early_stopping=False, do_sample=False, num_beams=3, ) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] print(generated_text) parsed_answer = processor.post_process_generation( generated_text, task=task_prompt, image_size=(image.width, image.height) ) return parsed_answer from PIL import Image import requests image = Image.open('form-1.png').convert('RGB') task_prompt = '' results = run_example(task_prompt, example['image'].resize(size=(1000, 1000))) print(results) ```