HunyuanDiT
Diffusers
Safetensors
English
Chinese

为啥会突然开源一个文生图,请问作者团队是什么考虑?

#3
by you111 - opened

如题

openAI最近都在前面造势了,这个时候开源当然有好处啊

架构不一样啊,这个是基于DiT的一组试验,文本遵从度更好,降低了纯扩散模型抽卡几率

目前主要的文生图开源社区依然主要基于U-Net架构模型进行开发,潜力巨大的中文DiT架构在开源社区仍然存在一定的空白。
腾讯混元文生图团队,与Sora、Stable Diffusion等团队不谋而合,率先探索了基于DiT架构的视觉生成模型,实测效果属于开源社区领先水平。
我们选择将最新的研究成果进行开源,让行业在更高的台阶上研发与发展,也希望能为基于DiT架构的文生图开源生态贡献属于腾讯的力量。同时,混元文生图是首个中文原生的DiT模型,具备中英文双语理解及生成能力,会在开源生态、及相关数据集中注入更多的中文元素,可以更好地满足中国企业和开发者的需求。

Currently, the major text-to-image open source community primarily develops models based on the U-Net architecture. There remains a significant gap in the open-source community for the promising DiT architecture, especially for Chinese applications.
Tencent Hunyuan text-to-image team, in alignment with teams like Sora and Stable Diffusion, has pioneered the exploration of vision generation models based on the DiT architecture. Our practical results demonstrate that our models are at the forefront of the open-source community.
We have chosen to open source our latest research achievements to elevate the industry's development and contribute Tencent's efforts to the text-to-image open-source ecosystem based on the DiT architecture. Moreover, Hunyuan's text-to-image model is the first native Chinese DiT model, with bilingual understanding and generation capabilities in both Chinese and English. This will enrich the open-source ecosystem and related datasets with more Chinese elements, better meeting the needs of Chinese enterprises and developers.

This comment has been hidden

🚀🚀🚀

yestinl changed discussion status to closed

Sign up or log in to comment