How to Encode Inputs for Supervised Fine-tuning of CogVLM

#17
by ayensujeremiah - opened

I am trying to fine tune the CogVLM model on instruction datasets (QA pairs). I have mimicked the way preprocessing is done for inference as found in the "build_conversation_input_ids" of the modelling script. What I do is after creating the input ids and token type ids, I clone this input ids for the labels and mask a part of it. But after training, The model's response indicated that my preprocessing wasn't good and affected the model's performance. So if the community and the authors of CogVLM could add instructions for this, that will be cool.

Did you ever find instructions for fine-tuning? I am trying to create a dataset for fine-tuning CogVLM and have not been able to find much information. Thanks!

Sign up or log in to comment