kanashi6
/

UFO-InternVL2-8B-reasonseg-ft

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

UFO-InternVL2-8B-reasonseg-ft / README.md

kanashi6's picture

Update README.md

581fb21 verified 4 days ago

|

history blame contribute delete

876 Bytes

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: image-text-to-text
	---

	This repository contains the model presented in the paper [UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface](https://hf.co/papers/2503.01342).

	UFO unifies object-level detection, pixel-level segmentation, and image-level vision-language tasks into a single model by transforming all perception targets into the language space. It introduces a novel embedding retrieval approach that relies solely on the language interface to support segmentation tasks.

	For more details, please refer to the original paper and the GitHub repository:

	- Paper: [UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface](https://hf.co/papers/2503.01342)
	- GitHub: [https://github.com/nnnth/UFO](https://github.com/nnnth/UFO)