--- license: mit base_model: - mistralai/Pixtral-12B-2409 pipeline_tag: image-text-to-text library_name: transformers tags: - lora datasets: - Multimodal-Fatima/FGVC_Aircraft_train - takara-ai/FloodNet_2021-Track_2_Dataset_HF --- # pixtral_aerial_VQA_adapter ## Model Details - **Type**: LoRA Adapter - **Total Parameters**: 6,225,920 - **Memory Usage**: 23.75 MB - **Precisions**: torch.float32 - **Layer Types**: - lora_A: 40 - lora_B: 40 ## Intended Use - **Primary intended uses**: Processing aerial footage of construction sites for structural and construction surveying. - Can also be applied to any detailed VQA use cases with aerial footage. ## Training Data - **Dataset**: 1. FloodNet Track 2 dataset 2. Subset of FGVC Aircraft dataset 3. Custom dataset of 10 image-caption pairs created using Pixtral ## Training Procedure - **Training method**: LoRA (Low-Rank Adaptation) - **Base model**: Ertugrul/Pixtral-12B-Captioner-Relaxed - **Training hardware**: Nebius-hosted NVIDIA H100 machine ## Citation ```bibtext @misc{rahnemoonfar2020floodnet, title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding}, author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy}, year={2020}, eprint={2012.02951}, archivePrefix={arXiv}, primaryClass={cs.CV}, doi={10.48550/arXiv.2012.02951} } ```