llama3-V project is stealing a lot of academic work from MiniCPM-Llama3-V 2.5 !

#23
by pzc163 - opened

Fellow MiniCPM-Llama3-V 2.5 project authors, a few days ago I discovered a shocking fact.There is a large amount of work in the llama3-V (https://github.com/mustafaaljadery/llama3v) project that is suspected to have been stolen from the MiniCPM-Llama3-V 2.5 project, and I raised my query in the GitHub project issue of llama3-v, and did not think that the The authors of Llama3-V quickly deleted my questionable post, and hid Llama3-V's Huggingface project page. I strongly question what they did, and I will release all the evidence next, and I urge you to pay attention to this fact.

Fact 1: The llama3-V project uses exactly the same model structure and code as the minicom-llama 3-v 2.5 project
Llama3-V has exactly the same model structure and config file as MiniCPM-Llama3-V 2.5, with only the difference in variable names. Left: MiniCPM-Llama3-V 2.5 Right: Llama3-V
image.png
Its code appears to be MiniCPM-Llama3-V 2.5's code with some reformatting and variable renaming, including but not limited to image slicing, tokenizer, resampler, and data loading. Just give some examples.
image.png
The author of Llama3-V refers to LLaVA-UHD for the architecture, and list difference (on ViT and LLM choice). What the author does not mention is that their specific implementation is identical to MiniCPM-Llama3-V 2.5, which is different from LLaVA-UHD in many ways, such as the spatial schema. Llama3-V also has the same tokenizer as MiniCPM-Llama3-V 2.5, including the special tokens newly defined by MiniCPM-Llama3-V 2.5.

image.png

image.png

Fact 2: When I questioned how the authors of llama3-v used MinicPM-Llama3-V2.5's tokenizer before the MinicPM-Llama3-V2.5 project was released, the authors of the llama3-v project began to lie.
image.png
The author of llama3-V project thought the tokenizer would be from here: https://huggingface.co/openbmb/MinicPM-V-2/blob/main/tokenizer.jsonBefore llama3 MiniCPM released.
but the fact is that MinicPM-V-2's tokenizer is totally different from MinicPM-Llama3-V2.5,below is the two files in Huggingface. Obviously, they are not the same tokenizer file, and their file sizes are completely different.

image.png

image.png

And MinicPM-Llama3-v2.5's tokenizer is llama3 tokenizer plus miniCPM-v series model of a few special token composition, and MinicPM-v2 release are before llama3 open source

Fact 3: The author of llama3-V project afraid to face questioning, deleted the issue I filed at llama3-V questioning their stealing.
Also, it seems the author does not fully understand MiniCPM-Llama3-V 2.5's architecture or their own code. Perceiver resampler is a single-layer cross-attention, not a two-layer self-attention. Sigmoid activation of SigLIP is not used for training multimodal large language models. These activations are only used for pretraining SigLIP.

Llama3-V:

image.png

image.png

MiniCPM-Llama3-V 2.5:

image.png
Visual feature extraction doesn't need sigmoid activation.

image.png

image.png

image.png

Based on the above three facts, I think there is sufficient evidence to prove that the llama3-v project has stolen the academic achievements of the minicpm-llama 3-v 2.5 project, and I strongly suggest that the minicpm-llama 3-v 2.5 project's team go to the complaint to expose the llama3-v project authors' stealing and lying about academic misconduct, and so on a series of problems! for more information pls focus on below GitHub issue:
https://github.com/OpenBMB/MiniCPM-V/issues/196

OpenBMB org

Hi @pzc163 ,
Thank you for sharing this important information with us. We are deeply shocked and will be paying special attention to this matter. We are now launching an investigation to verify the above situation. Any new findings will be quickly disclosed to you, to the open-source community, and the public.

😟 This situation sounds extremely serious. We never expected anything like this to happen. We hope the truth will come to light soon.

Cuiunbo pinned discussion
OpenBMB org

Our response : link1 and link2

The conclusion of our investigation:

  • Llama3-V can be run using MiniCPM-Llama3-V 2.5's code and config.json after changing param names
  • It behaves similarly to MiniCPM-Llama3-V 2.5 in unrevealed experimental features trained on in-house data, e.g., recognizing Tsinghua Bamboo Characters and GUIAgent
  • It is somewhat similar to a noised version of MiniCPM-Llama3-V 2.5?

After receiving the issue from @yangzhizheng1on GitHub, we launched a serious investigation. We can obtain inference results correctly using Llama3-V checkpoint with MiniCPM-Llama3-V 2.5's code and config file following @yangzhizheng1's instruction on GitHub. Even more, we also surprisingly find that Llama3-V shows highly similar behaviors to MiniCPM-Llama3-V 2.5 in some unrevealed experimental features, which are trained on private in-house data, such as recognizing Tsinghua Bamboo Characters.

One of the experimental features of MiniCPM-Llama3-V 2.5 is recognizing Tsinghua Bamboo Characters (清华简), a very special and rare type of Chinese ancient characters written on bamboo during China's Warring States Period (475 BC-221 BC). These training images are recently scanned from unearthed cultural relics and annotated by our team, which is not been publicly released yet. Surprisingly, we find highly similar capabilities for Llama3-V in both good and bad cases.
0e75fc3fc5ad16ed878efcf3b4944bc

For quantative results, we also tested several Llama3-based VLMs on 1K Bamboo Character images and compared the prediction exact match for each pair of models.
image

The overlaps between every two models are zero, whereas the overlaps between Llama3-V and MiniCPM-Llama3-V 2.5 achieve a surprising 87%. Moreover, MiniCPM-Llama3-V 2.5 and Llama3-V even share a similar error distribution. Llama3-V and MiniCPM-Llama3-V 2.5 make 236 and 194 wrong predictions respectively, while the overlapped part is 182. The MiniCPM-Llama3-V2.5-noisy obtained following @yangzhizheng1's instruction on GitHub shows nearly identical quantative results with Llama3-V. This is really confusing...

The same thing also happens to WebAgent, another unrevealed feature trained on in-house data. They even make identical errors in a WebAgent schema newly defined within our team...
8c9b082c7655f73262fb053f41448d6

Since the HuggingFace page of Llama3-V is removed now, we upload the checkpoint here (https://thunlp.oss-cn-qingdao.aliyuncs.com/multi_modal/llama3v.tar). Since this model has received several thousands of downloads on HuggingFace, there should be independent copies to reproduce this.

Given these results, we are afraid it is hard to explain such unusual similarities as coincidences. We hope the authors can give an official explanation of the issue. We believe this is important for the common good of the open-source community.

I am sharing this misleading explanation I received as a matter of factual record.

image.png

I think it is necessary to contact the USC regarding this matter. Their false advertising has had a detrimental impact, amplified by its dissemination through numerous tech media outlets. As of now, a Google search yields no fewer than 1,000 pages perpetuating these claims. This has created a highly negative impact, especially their assertion of surpassing GPT-4o's capabilities for a mere $500, which devalues and discredits the efforts of the entire OS community.

image.png

image.png

Very sorry that this happened.
Our team has always been passionate to contributing open-source models to serve more people freely. Ever since MiniCPM-Llama3-V 2.5's release, we are thrilled to see the positive feedback from the community. We believe every contribution in the open-source community is invaluable.
Now, few people have seen the truth of what happened, and we hope those who are willing can speak up to let more people know, thanks.

23333 清华 YYDS啊

I am sharing this misleading explanation I received as a matter of factual record.

image.png

I think it is necessary to contact the USC regarding this matter. Their false advertising has had a detrimental impact, amplified by its dissemination through numerous tech media outlets. As of now, a Google search yields no fewer than 1,000 pages perpetuating these claims. This has created a highly negative impact, especially their assertion of surpassing GPT-4o's capabilities for a mere $500, which devalues and discredits the efforts of the entire OS community.

image.png

image.png

Agreed. It seems that their actions are deliberately planned, aiming for rapid and extensive coverage in tech news with attention-grabbing assertions. This strategy will make the stolen credits be attributed to them quickly, before the original authors can even realize it.

Cuiunbo unpinned discussion
No description provided.

Sign up or log in to comment