Transformers
Safetensors
Inference Endpoints

A few issues with these models

#5
by dhdeco - opened

These models are fantastic (way, way, way better than any of the alternatives), just a few little issues for those who may be inclined to use them outside of docling-parse:

  • It's not possible to use them directly with from_pretrained since the locations are non-standard
  • The labels in the layout model appear to be off by one - List-item is recognized as Formula, Section-header as Picture, etc.

Is there a reason for this that I'm not understanding? Thanks again!

For those who can read French, a quick discussion of using the layout model: https://ecolingui.ca/fr/blog/analyse-mise-en-page-2/

Docling org

These models are fantastic (way, way, way better than any of the alternatives), just a few little issues for those who may be inclined to use them outside of docling-parse:

  • It's not possible to use them directly with from_pretrained since the locations are non-standard
  • The labels in the layout model appear to be off by one - List-item is recognized as Formula, Section-header as Picture, etc.

Is there a reason for this that I'm not understanding? Thanks again!

OK, there is no reason. Let us do a quick check ...

Is there a code-snippet you could share?

Sure - I've excluded the imports and hub downloads for space:

processor = RTDetrImageProcessorFast.from_json_file(processor_config_path)
model = RTDetrForObjectDetection.from_pretrained(os.path.dirname(config_path))
image = Image.open(BytesIO(requests.get("http://ecolingui.ca/pdf_page.png").content))
with torch.inference_mode():
    inputs = processor(images=image, return_tensors="pt")
    outputs = model(**inputs)
img_results = processor.post_process_object_detection(
    outputs,
    target_sizes=[(image.height, image.width)],
)[0]
draw = ImageDraw.Draw(image)
for label, box in zip(img_results["labels"], img_results["boxes"]):
    label = model.config.id2label[label.item()]
    box = [round(x) for x in box.tolist()]
    draw.rectangle(box, outline="red")
    draw.text((box[0], max(0, box[1] - 12)), label, fill="red")
image.save("annotated.png")

Giving this output:
annotated.png
If I change idlabel[label.item()] to idlabel[label.item() + 1] then I get something more reasonable:

annotated2.png

I assume the issue is that the background class (which has ID 0) doesn't actually exist in the model.

Oh and also... I just noticed that your code also adds 1 to the predicted label ID ;-)

https://github.com/DS4SD/docling-ibm-models/blob/main/docling_ibm_models/layoutmodel/layout_predictor.py#L158

Sign up or log in to comment