noob: Can we use RAG to improve cogagent/cogvlm performance

#16
by anothercoder2 - opened

Hi,
I would like to improve performance of cogagent/cogvlm on a custom images of products and services.
Was wondering if there is a route to do this using a mutli-modal RAG instead of fine tuning.
I am concerned I don't have the skills to fine tune.
Thanks
-AC

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

We haven't tried much with it yet, but I think it's a good approach, and more CogVLM/CogAgent models are really difficult.
It should be noted that the token length supported by this model is very short (2048, which is caused by the training set), which will cause RAG to easily exceed the token limit, making the effect very poor.

Can the context length be extended using Longrope etc?
Any plans to do this or instructions on how it could be done, so the open source community can help out.
18B model with a 2k window does not make sense.

Sign up or log in to comment