MiaoshouAI's picture
Update README.md
da7ac9f verified
metadata
license: mit

Florence-2-base-PromptGen

Florence-2-base-PromptGen is a model trained for MiaoshouAI Tagger for ComfyUI. It is an advanced image captioning tool based on the Microsoft Florence-2 Model and fine-tuned to perfection.

Why another tagging model?

Most vision models today are trained mainly for general vision recognition purposes, but when doing prompting and image tagging for model training, the format and details of the captions is quite different.

Florence-2-base-PromptGen is trained on such a purpose as aiming to improve the tagging experience and accuracy of the prompt and tagging job. The model is trained based on images and cleaned tags from Civitai so that the end result for tagging the images are the prompts you use to generate these images.

Instruction prompt:

A new instruction prompt <GENERATE_PROMPT> is created for this purpose in addition to <DETAILED_CAPTION> and <MORE_DETAILED_CAPTION>. It will respond back in danbooru tagging style with much better accuracy and proper level of details.

Version Histroy:

v0.8 New Instruction trained for <GENERATE_PROMPT>

v0.9 Improved vision ability for uncensored data for <DETAILED_CAPTION> and <MORE_DETAILED_CAPTION>

How to use:

To use this model, you can load it directly from the Hugging Face Model Hub:


model = AutoModelForCausalLM.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)

prompt = "<GENERATE_PROMPT>"

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)

generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    do_sample=False,
    num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))

print(parsed_answer)

Use under MiaoshouAI Tagger ComfyUI

If you just want to use this model, you can use it under ComfyUI-Miaoshouai-Tagger

https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger

A detailed use and install instruction is already there.