microsoft
/

dit-base-finetuned-rvlcdip

Image Classification

Inference Endpoints

Model card Files Files and versions Community

nielsr HF staff commited on Mar 8, 2022

Commit

e799cfb

•

1 Parent(s): 0e04cb2

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -1,3 +1,11 @@
 # Document Image Transformer (base-sized model)
 Document Image Transformer (DiT) model pre-trained on IIT-CDIP (Lewis et al., 2006), a dataset that includes 42 million document images and fine-tuned on [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/), a dataset consisting of 400,000 grayscale images in 16 classes, with 25,000 images per class. It was introduced in the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Li et al. and first released in [this repository](https://github.com/microsoft/unilm/tree/master/dit). Note that DiT is identical to the architecture of [BEiT](https://huggingface.co/docs/transformers/model_doc/beit).

+---
+tags:
+- vision
+- image-classification
+widget:
+- src: coca_cola_advertisement.png
+---
 # Document Image Transformer (base-sized model)
 Document Image Transformer (DiT) model pre-trained on IIT-CDIP (Lewis et al., 2006), a dataset that includes 42 million document images and fine-tuned on [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/), a dataset consisting of 400,000 grayscale images in 16 classes, with 25,000 images per class. It was introduced in the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Li et al. and first released in [this repository](https://github.com/microsoft/unilm/tree/master/dit). Note that DiT is identical to the architecture of [BEiT](https://huggingface.co/docs/transformers/model_doc/beit).