codedrainer commited on
Commit
6ad4e1c
1 Parent(s): aca99b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -1,3 +1,51 @@
1
  ---
2
  license: mit
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ tags:
4
+ - donut
5
+ - uae-license
6
+ - vision
7
  ---
8
+
9
+ # Donut (base-sized model, fine-tuned on RVL-CDIP)
10
+
11
+ Donut model fine-tuned on RVL-CDIP. It was introduced in the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewok et al. and first released in [this repository](https://github.com/clovaai/donut).
12
+
13
+ Disclaimer: The team releasing Donut did not write a model card for this model so this model card has been written by the Hugging Face team.
14
+
15
+ ## Model description
16
+
17
+ Donut consists of a vision encoder (Swin Transformer) and a text decoder (BART). Given an image, the encoder first encodes the image into a tensor of embeddings (of shape batch_size, seq_len, hidden_size), after which the decoder autoregressively generates text, conditioned on the encoding of the encoder.
18
+
19
+ ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/donut_architecture.jpg)
20
+
21
+ ## Intended uses & limitations
22
+
23
+ This model is fine-tuned on RVL-CDIP, a document image classification dataset.
24
+
25
+ We refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/donut) which includes code examples.
26
+
27
+ ### BibTeX entry and citation info
28
+
29
+ ```bibtex
30
+ @article{DBLP:journals/corr/abs-2111-15664,
31
+ author = {Geewook Kim and
32
+ Teakgyu Hong and
33
+ Moonbin Yim and
34
+ Jinyoung Park and
35
+ Jinyeong Yim and
36
+ Wonseok Hwang and
37
+ Sangdoo Yun and
38
+ Dongyoon Han and
39
+ Seunghyun Park},
40
+ title = {Donut: Document Understanding Transformer without {OCR}},
41
+ journal = {CoRR},
42
+ volume = {abs/2111.15664},
43
+ year = {2021},
44
+ url = {https://arxiv.org/abs/2111.15664},
45
+ eprinttype = {arXiv},
46
+ eprint = {2111.15664},
47
+ timestamp = {Thu, 02 Dec 2021 10:50:44 +0100},
48
+ biburl = {https://dblp.org/rec/journals/corr/abs-2111-15664.bib},
49
+ bibsource = {dblp computer science bibliography, https://dblp.org}
50
+ }
51
+ ```