--- library_name: transformers license: apache-2.0 datasets: - ds4sd/DocLayNet pipeline_tag: image-segmentation --- # DETR-layout-detection We present the model cmarkea/detr-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document. This is a fine-tuning of the model [detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet) dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an ODQA system. This model allows extracting 11 entities, which are: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title. ## Performance ## Direct Use ```python from transformers import AutoImageProcessor from transformers.models.detr import DetrForSegmentation img_proc = AutoImageProcessor.from_pretrained( "ArkeaIAF/detr-layout-detection" ) model = DetrForSegmentation.from_pretrained( "ArkeaIAF/detr-layout-detection" ) with torch.inference_mode(): input_ids = img_proc(img, return_tensors='pt') output = model(**input_ids) threshold=0.4 segmentation_mask = img_proc.post_process_segmentation( out_seg, threshold=threshold, target_sizes=[img.size[::-1]] ) bbox_pred = img_proc.post_process_object_detection( output, threshold=threshold, target_sizes=[img.size[::-1]] ) ``` ### Citation ``` @online{DeDetrLay, AUTHOR = {Cyrile Delestre}, URL = {https://huggingface.co/cmarkea/detr-base-layout-detection}, YEAR = {2024}, KEYWORDS = {Image Processing ; Transformers ; Layout}, } ```