Spaces:
Sleeping
Sleeping
BEiT v2 Fine-tuning on ImageNet-1k (Image Classification)
Model Zoo
We provide some finetuned models here.
model name | pre-training epochs on ImageNet-1k | intermeidate fine-tuning epochs on ImageNet-21k | fine-tuning epochs on ImageNet-1k | weight | top-1 accuracy (%) |
---|---|---|---|---|---|
beit_base_patch16_224 | 300 | 0 | 100 | link | 85.0 |
beit_base_patch16_224 | 1600 | 0 | 100 | link | 85.5 |
beit_base_patch16_224 | 1600 | 90 | 30 | link | 86.5 |
beit_large_patch16_224 | 300 | 0 | 50 | link | 86.6 |
beit_large_patch16_224 | 1600 | 0 | 50 | link | 87.3 |
beit_large_patch16_224 | 1600 | 90 | 20 | link | 88.4 |
Example: Fine-tuning BEiT v2 on ImageNet-1k (Image Classification)
The BEiT v2 base model can be finetuned on ImageNet-1k using a DGX box (8 V100-32GB):
python -m torch.distributed.launch --nproc_per_node=8 run_class_finetuning.py \
--data_path /path/to/imagenet-1k/train \
--eval_data_path /path/to/imagenet-1k/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir /path/to/save/your_model \
--log_dir /path/to/save/your_model \
--model beit_base_patch16_224 \
--weight_decay 0.05 \
--finetune https://conversationhub.blob.core.windows.net/beit-share-public/beitv2/beitv2_base_patch16_224_pt1k_ft21k.pth?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D \
--batch_size 128 \
--lr 5e-5 \
--update_freq 1 \
--warmup_epochs 20 \
--epochs 30 \
--layer_decay 0.75 \
--drop_path 0.1 \
--mixup 0. \
--cutmix 0. \
--imagenet_default_mean_and_std \
--dist_eval \
--save_ckpt_freq 20 \
--enable_deepspeed
--batch_size
: batch size per GPU. Effective batch size =number of GPUs
*--batch_size
*--update_freq
. So in the above example, the effective batch size is8*128*1 = 1024
.--finetune
: weight path of your pretrained models; you can pretrain it by yourself, or download the pretrained model weights in PRETRAINING.md--lr
: learning rate. 5e-4 for pretrained models and 5e-5 for intermediate fine-tuned models.--epochs
: fine-tuning epochs. 100 for pretrained models and 30 for intermediate fine-tuned models.--mixup
: 0.8 for pretrained models and 0. for intermediate fine-tuned models.--cutmix
: 1.0 for pretrained models and 0. for intermediate fine-tuned models.--layer_decay
: 0.6 for 1600 epochs pretrained models and 0.65 for 300 epochs. 0.75 for intermediate fine-tuned models.--drop_path
: 0.2 for 1600 epochs pretrained models and 0.1 for 300 epochs. 0.1 for intermediate fine-tuned models.--enable_deepspeed
: optional.
The BEiT v2 large model can be finetuned on ImageNet-1k using a DGX box (8 V100-32GB):
python -m torch.distributed.launch --nproc_per_node=8 run_class_finetuning.py \
--data_path /path/to/imagenet-1k/train \
--eval_data_path /path/to/imagenet-1k/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir /path/to/save/your_model \
--log_dir /path/to/save/your_model \
--model beit_large_patch16_224 \
--weight_decay 0.05 \
--finetune https://conversationhub.blob.core.windows.net/beit-share-public/beitv2/beitv2_large_patch16_224_pt1k_ft21k.pth?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D \
--batch_size 64 \
--lr 7e-5 \
--update_freq 2 \
--warmup_epochs 5 \
--epochs 20 \
--layer_decay 0.8 \
--drop_path 0.25 \
--mixup 0. \
--cutmix 0. \
--imagenet_default_mean_and_std \
--dist_eval \
--save_ckpt_freq 10 \
--enable_deepspeed
--batch_size
: batch size per GPU. Effective batch size =number of GPUs
*--batch_size
*--update_freq
. So in the above example, the effective batch size is8*64*2 = 1024
.--finetune
: weight path of your pretrained models; you can pretrain it by yourself, or download the pretrained model weights in PRETRAINING.md.--lr
: learning rate. 2e-4 for 1600 epochs pretrained model and 5e-4 for 300 epochs. 7e-5 for intermediate fine-tuned models.--epochs
: fine-tuning epochs. 50 for pretrained models and 20 for intermediate fine-tuned models.--mixup
: 0.8 for pretrained models and 0. for intermediate fine-tuned models.--cutmix
: 1.0 for pretrained models and 0. for intermediate fine-tuned models.--layer_decay
: 0.8 for all models.--drop_path
: 0.2 for pretrained models and 0.25 for intermediate fine-tuned models.--enable_deepspeed
: optional.
Example: Evaluate BEiT v2 Finetuned model on ImageNet-1k (Image Classification)
- Evaluate our fine-tuned BEiT-base model on ImageNet val with a single GPU:
python -m torch.distributed.launch --nproc_per_node=1 run_class_finetuning.py \
--data_path /path/to/imagenet-1k/train \
--eval_data_path /path/to/imagenet-1k/val \
--nb_classes 1000 \
--data_set image_folder \
--model beit_base_patch16_224 \
--finetune https://conversationhub.blob.core.windows.net/beit-share-public/beitv2/beitv2_base_patch16_224_pt1k_ft21kto1k.pth?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D \
--batch_size 128 \
--imagenet_default_mean_and_std \
--dist_eval \
--eval
Expected results:
* Acc@1 86.458 Acc@5 97.978 loss 0.569
- Evaluate our fine-tuned BEiT-large model on ImageNet val with a single GPU:
python -m torch.distributed.launch --nproc_per_node=1 run_class_finetuning.py \
--data_path /path/to/imagenet-1k/train \
--eval_data_path /path/to/imagenet-1k/val \
--nb_classes 1000 \
--data_set image_folder \
--model beit_large_patch16_224 \
--finetune https://conversationhub.blob.core.windows.net/beit-share-public/beitv2/beitv2_large_patch16_224_pt1k_ft21kto1k.pth?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D \
--batch_size 128 \
--imagenet_default_mean_and_std \
--dist_eval \
--eval
Expected results:
* Acc@1 88.370 Acc@5 98.578 loss 0.493
Robust Evaluation on ImageNet Variants (Image Classification)
Download the datasets (ImageNet-Adversarial, ImageNet-Rendition, and ImageNet-Sketch) following timm.
For ImageNet-Rendition variants, one can test it like:
python -m torch.distributed.launch --nproc_per_node=1 run_class_finetuning.py \
--robust_test 'imagenet_r' \
--data_path /path/to/imagenet-r \
--eval_data_path /path/to/imagenet-r \
--nb_classes 200 \
--data_set image_folder \
--model beit_large_patch16_224 \
--finetune https://conversationhub.blob.core.windows.net/beit-share-public/beitv2/beitv2_large_patch16_224_pt1k_ft1k.pth?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D \
--batch_size 128 \
--imagenet_default_mean_and_std \
--dist_eval \
--save_ckpt_freq 20 \
--eval
--robust_test
:imagenet_r
for ImageNet-Rendition andimagenet_a
for ImageNet-Adversarial.--nb_classes
: 200 for ImageNet-Rendition and ImageNet-Adversarial, and 1000 for ImageNet-Sketch.
Expected results:
* Acc@1 69.940 Acc@5 82.890 loss 1.541
Example: Intermediate fine-tuning BEiT v2 on ImageNet-21k
The BEiT v2 base/large model can be intermediate fine-tuned on ImageNet-21k using 2 DGX boxes (16 V100-32GB):
python -m torch.distributed.launch --nnodes 2 --node_rank {0, 1} --nproc_per_node=16 run_class_finetuning.py \
--data_path /path/to/imagenet-21k/train \
--disable_eval_during_finetuning \
--nb_classes 21841 \
--data_set image_folder \
--output_dir /path/to/save/your_model \
--log_dir /path/to/save/your_model \
--model beit_base_patch16_224 \
--weight_decay 0.05 \
--finetune /path/to/save/your_pretraining_model \
--batch_size 128 \
--lr 6e-4 \
--update_freq 1 \
--warmup_epochs 20 \
--epochs 90 \
--layer_decay 0.75 \
--drop_path 0.1 \
--mixup 0.8 \
--cutmix 1.0 \
--imagenet_default_mean_and_std \
--save_ckpt_freq 20 \
--enable_deepspeed
--batch_size
: batch size per GPU. Effective batch size =number of GPUs
*--batch_size
*--update_freq
. So in the above example, the effective batch size is32*128*1 = 4096
.--finetune
: weight path of your pretrained model; you can pretrain it by yourself, or download the pretrained model weight in PRETRAINING.md--drop_path
: 0.1 for base model and 0.2 for large model.--enable_deepspeed
: optional.
We provide some intermediate fine-tuned models here.