--- license: mit --- # DPT 3.1 (BEiT backbone) DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/DPT). DPT uses the [BEiT](https://huggingface.co/docs/transformers/model_doc/beit) model as backbone and adds a neck + head on top for monocular depth estimation. ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg) Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description The Table Transformer is equivalent to [DETR](https://huggingface.co/docs/transformers/model_doc/detr), a Transformer-based object detection model. Note that the authors decided to use the "normalize before" setting of DETR, which means that layernorm is applied before self- and cross-attention. ## How to use Here is how to use this model for zero-shot depth estimation on an image: ```python from transformers import DPTImageProcessor, DPTForDepthEstimation import torch import numpy as np from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = DPTImageProcessor.from_pretrained("Intel/dpt-large") model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large") # prepare image for the model inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) predicted_depth = outputs.predicted_depth # interpolate to original size prediction = torch.nn.functional.interpolate( predicted_depth.unsqueeze(1), size=image.size[::-1], mode="bicubic", align_corners=False, ) # visualize the prediction output = prediction.squeeze().cpu().numpy() formatted = (output * 255 / np.max(output)).astype("uint8") depth = Image.fromarray(formatted) ``` or one can use the pipeline API: ```python from transformers import pipeline pipe = pipeline(task="depth-estimation", model="Intel/dpt-beit-base-384") result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg") result["depth"] ```