--- library_name: transformers language: - en - es base_model: - facebook/detr-resnet-101 --- # Model Card for Model ID DETR allows to detect and generate the bounding boxes for handwritten and cursive text. This model was finetuned using the base model facebook/detr-resnet-101. The dataset used is still under development and possible released in future versions. Mainly, the model detects spanish text. Note: The default value of generated bounding boxes was used (num_queries: 100). Modifying this value when using the model could lead to unexpected behavior. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** Rodrigo Alvarez - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** Text Detection / Bounding Box generation - **Language(s) (NLP):** en (default), es-MX (finetuned) - **License:** [More Information Needed] - **Finetuned from model [optional]:** facebook/detr-resnet-101 ### Model Sources [optional] - **Repository:** [https://github.com/rodrigoalvarez-20/detr_trocr_handwritten_text/development](DETR TROCR Lab) - **Paper [optional]:** *Work in progress* - **Demo [optional]:** [https://github.com/rodrigoalvarez-20/detr_trocr_handwritten_text/blob/development/detr_lab.ipynb](Demo) ## Uses ### Direct Use ```python from transformers import DetrForObjectDetection, DetrImageProcessor import torch import cv2 import supervision as sv # User defined constants MODEL_CHECKPOINT = "Rodr16020/detr_handwriten_cursive_text_detection" DEVICE = "cuda" CONFIDENCE_TRESHOLD = 0.5 # This parameter allows to filter the generated boxes with a confidence score >= to this value IOU_TRESHOLD = 0.5 TEST_IMAGE = "demo.jpeg" # Path to the test image #Load the model and preprocessor img_proc = DetrImageProcessor.from_pretrained(MODEL_CHECKPOINT) detr_model = DetrForObjectDetection.from_pretrained( pretrained_model_name_or_path=MODEL_CHECKPOINT, ignore_mismatched_sizes=True ).to(DEVICE) # Get the pixel values of the image (matrix) image = cv2.imread(TEST_IMAGE) # inference with torch.no_grad(): # load image and predict inputs = img_proc(images=image, return_tensors='pt').to(DEVICE) outputs = detr_model(**inputs) # post-process # Resize the generated Bounding Boxes coords to the image original size target_sizes = torch.tensor([image.shape[:2]]).to(DEVICE) results = img_proc.post_process_object_detection( outputs=outputs, threshold=CONFIDENCE_TRESHOLD, target_sizes=target_sizes )[0] # To extract all the generated bboxes boxes = results["boxes"].tolist()[0] # With supervision lib, use the generated coords to annotate the image and preview the boxes box_annotator = sv.BoxAnnotator() detections = sv.Detections.from_transformers(transformers_results=results).with_nms(threshold=0.1) labels = [f"{confidence:.2f}" for _,_, confidence, class_id, _ in detections] frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels) sv.plot_image(frame, (16, 16)) ``` [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - Dataset Format: COCO - Device: CUDA - WEIGHT_DECAY = 3e-3 - CLIP_GRAD = 1e-4 #0.001 - BATCH_SIZE = 8 - ACC_BATCH = BATCH_SIZE * 4 - MODEL_LR = 5e-4 # In some articles, they set the value to 5e-4, but, in my case, it doesn't work, so I try with this and works "well" - BB_LR = 5e-4 # Same as above - MAX_EPOCHS = 300 # Use >= 50 . But it stops learning near the step 70 #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure A simple and a tiny computer at CIC research lab. When finetuning, the model and data used a total of #### Hardware - ASRock-placa base Z370/OEM - Gabinete Corsair 4000D Airflow - Procesador Intel Core i7 i7-8700K - Memoria RAM XPG Spectrix DDR4, 3200MHz, 16GB (x4) - SSD Externo Western Digital WD My Passport, 1TB - NVIDIA GeForce RTX 4090 24GB - Corsair Serie RMX, RM1000x, 1000 W #### Software - transformers - pytorch - tensorboard - cv2 - supervision And possibly others ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]