This is the model repository of paper EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data.
The model is fine-tuned based on Monkey. In order to speed up the training, we also made some minor modifications:
- Instead of using the Lora Adapters in Monkey, the five patches of the raw image are stacked in an extra batch dimension and sent to the image encoder for processing at the same time.
- Inside the image encoder, we use flash attention instead of the manually implemented attention.
- Separate the step of reading the image from the forward propagation and make it a step of dataset preprocessing to speed up image reading using the
Dataloader
in pytorch.
The training dataset (i.e. all training QAs in .jsonl
format, excluding images) is published in repository EDGE-Dataset.
The model training and inference scripts are published in anonymous repository EDGE.
- Downloads last month
- 14
Model tree for EDGEwww25/EDGE-Model
Base model
echo840/Monkey-Chat