How to finetune using DPO?

#31

by Maverick17 - opened Nov 12, 2024

Nov 12, 2024

Hello,

I have a standard DPO dataset with columns for images, rejected points, and chosen points, containing 2D coordinates for GUI visual grounding tasks. What prompt format is needed to correctly train the model using the DPO technique? The paper mentions that a 2D PixMo-Points dataset was used to train the model, but could you clarify the exact approach?

amanrangapur

Ai2 org Nov 14, 2024

Hello @Maverick17 , we are releasing paper with complete details of dataset, training and evaluation shortly.

Maverick17

Nov 14, 2024

Hello @amanrangapur , shortly means by the end of this week or by the end of november? :)

I'm really looking forward to the release of the dataset, training and eval. scripts!

amanrangapur

Ai2 org Nov 15, 2024

Hi @Maverick17 , I mean last week of November..

Maverick17

Nov 26, 2024

Hello @amanrangapur , what is the state of data release? We are entering the end of November :)

amanrangapur

Ai2 org Nov 26, 2024

Hey @Maverick17 , we're planning to release this week. Stay tuned.

Maverick17

Dec 4, 2024

@amanrangapur Seems you guys are still not ready...

amanrangapur

Ai2 org Dec 5, 2024

•

edited Dec 12, 2024

Hi @Maverick17 , dataset is out(subset) check this: https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b
Training, evals, checkpoints are here: https://github.com/allenai/molmo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment