A few issues with these models

by dhdeco - opened Jan 15

Jan 15

•

These models are fantastic (way, way, way better than any of the alternatives), just a few little issues for those who may be inclined to use them outside of docling-parse:

It's not possible to use them directly with from_pretrained since the locations are non-standard
The labels in the layout model appear to be off by one - List-item is recognized as Formula, Section-header as Picture, etc.

Is there a reason for this that I'm not understanding? Thanks again!

dhdeco

Jan 15

For those who can read French, a quick discussion of using the layout model: https://ecolingui.ca/fr/blog/analyse-mise-en-page-2/

PeterWJStaar

Docling org Jan 16

These models are fantastic (way, way, way better than any of the alternatives), just a few little issues for those who may be inclined to use them outside of docling-parse:

It's not possible to use them directly with from_pretrained since the locations are non-standard

The labels in the layout model appear to be off by one - List-item is recognized as Formula, Section-header as Picture, etc.

Is there a reason for this that I'm not understanding? Thanks again!

OK, there is no reason. Let us do a quick check ...

Is there a code-snippet you could share?

dhdeco

Jan 16

Sure - I've excluded the imports and hub downloads for space:

processor = RTDetrImageProcessorFast.from_json_file(processor_config_path)
model = RTDetrForObjectDetection.from_pretrained(os.path.dirname(config_path))
image = Image.open(BytesIO(requests.get("http://ecolingui.ca/pdf_page.png").content))
with torch.inference_mode():
    inputs = processor(images=image, return_tensors="pt")
    outputs = model(**inputs)
img_results = processor.post_process_object_detection(
    outputs,
    target_sizes=[(image.height, image.width)],
)[0]
draw = ImageDraw.Draw(image)
for label, box in zip(img_results["labels"], img_results["boxes"]):
    label = model.config.id2label[label.item()]
    box = [round(x) for x in box.tolist()]
    draw.rectangle(box, outline="red")
    draw.text((box[0], max(0, box[1] - 12)), label, fill="red")
image.save("annotated.png")

Giving this output:

If I change idlabel[label.item()] to idlabel[label.item() + 1] then I get something more reasonable:

dhdeco

Jan 16

I assume the issue is that the background class (which has ID 0) doesn't actually exist in the model.

dhdeco

Jan 16

•

edited Jan 16

Oh and also... I just noticed that your code also adds 1 to the predicted label ID ;-)

https://github.com/DS4SD/docling-ibm-models/blob/main/docling_ibm_models/layoutmodel/layout_predictor.py#L158

Reggieag

Jan 28

It would be great if we could use models directly. I want to use just the layout model, but I have to create custom code to make it work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment