A few issues with these models
These models are fantastic (way, way, way better than any of the alternatives), just a few little issues for those who may be inclined to use them outside of docling-parse:
- It's not possible to use them directly with
from_pretrained
since the locations are non-standard - The labels in the layout model appear to be off by one -
List-item
is recognized asFormula
,Section-header
asPicture
, etc.
Is there a reason for this that I'm not understanding? Thanks again!
For those who can read French, a quick discussion of using the layout model: https://ecolingui.ca/fr/blog/analyse-mise-en-page-2/
These models are fantastic (way, way, way better than any of the alternatives), just a few little issues for those who may be inclined to use them outside of docling-parse:
- It's not possible to use them directly with
from_pretrained
since the locations are non-standard- The labels in the layout model appear to be off by one -
List-item
is recognized asFormula
,Section-header
asPicture
, etc.Is there a reason for this that I'm not understanding? Thanks again!
OK, there is no reason. Let us do a quick check ...
Is there a code-snippet you could share?
Sure - I've excluded the imports and hub downloads for space:
processor = RTDetrImageProcessorFast.from_json_file(processor_config_path)
model = RTDetrForObjectDetection.from_pretrained(os.path.dirname(config_path))
image = Image.open(BytesIO(requests.get("http://ecolingui.ca/pdf_page.png").content))
with torch.inference_mode():
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
img_results = processor.post_process_object_detection(
outputs,
target_sizes=[(image.height, image.width)],
)[0]
draw = ImageDraw.Draw(image)
for label, box in zip(img_results["labels"], img_results["boxes"]):
label = model.config.id2label[label.item()]
box = [round(x) for x in box.tolist()]
draw.rectangle(box, outline="red")
draw.text((box[0], max(0, box[1] - 12)), label, fill="red")
image.save("annotated.png")
Giving this output:
If I change idlabel[label.item()]
to idlabel[label.item() + 1]
then I get something more reasonable:
I assume the issue is that the background
class (which has ID 0) doesn't actually exist in the model.
Oh and also... I just noticed that your code also adds 1 to the predicted label ID ;-)