noob: Can we use RAG to improve cogagent/cogvlm performance

#16

by anothercoder2 - opened Feb 5

Feb 5

Hi,
I would like to improve performance of cogagent/cogvlm on a custom images of products and services.
Was wondering if there is a route to do this using a mutli-modal RAG instead of fine tuning.
I am concerned I don't have the skills to fine tune.
Thanks
-AC

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Feb 7

We haven't tried much with it yet, but I think it's a good approach, and more CogVLM/CogAgent models are really difficult.
It should be noted that the token length supported by this model is very short (2048, which is caused by the training set), which will cause RAG to easily exceed the token limit, making the effect very poor.

anothercoder2

11 days ago

Can the context length be extended using Longrope etc?
Any plans to do this or instructions on how it could be done, so the open source community can help out.
18B model with a 2k window does not make sense.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment