|
--- |
|
base_model: |
|
- Ultralytics/YOLOv8 |
|
pipeline_tag: image-segmentation |
|
license: agpl-3.0 |
|
--- |
|
|
|
## Text region detection from Finnish 19th century Court Records |
|
|
|
The model is trained to segment digitized 19th century court record documents based on four predefined text region categories. |
|
The model has been trained using yolov8x-seg by Ultralytics as the base model. |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
The model has been trained to detect four types of text regions from digitized Finnish 19th century court record documents: |
|
|
|
- text paragraphs |
|
- marginalia |
|
- page numbers |
|
- case starts |
|
|
|
Case starts refer to symbols used for indicating the start of a new court case in the data. |
|
|
|
<img src='segmentation_example.jpg' width='700'> |
|
|
|
Most of the training data consist of handwritten documents, but the model appears to generalize quite well also to typeset data. |
|
|
|
## Training data |
|
|
|
Training dataset consisted of 815 digitized and annotated 19th century court record documents, while validation and test |
|
datasets both contained 102 annotated document images. |
|
|
|
## Training procedure |
|
|
|
This model was trained using 2 NVIDIA RTX A6000 GPUs with the following hyperparameters: |
|
|
|
- image size: 640 |
|
- learning rate (lr0): 0.01 |
|
- train batch size: 32 |
|
- epochs: 100 |
|
- patience: 20 epochs |
|
- optimizer: SGD |
|
- scheduler: cosine learning rate scheduler (cos_lr=True) |
|
- workers: 4 |
|
|
|
Default settings were used for other training hyperparameters (find more information [here](https://docs.ultralytics.com/modes/train/#train-settings)). |
|
|
|
Model training was performed using the following code: |
|
|
|
```python |
|
from ultralytics import YOLO |
|
|
|
# Use pretrained Yolo segmentation model |
|
model = YOLO('yolov8x-seg.pt') |
|
|
|
# Path to .yaml file where data location and object classes are defined |
|
yaml_path = 'text_regions.yaml' |
|
|
|
# Start model training with the defined parameters |
|
model.train(data=yaml_path, name='model_name', epochs=100, imgsz=640, workers=4, optimizer='SGD', lr0=0.01, seed=551, val=True, cos_lr=True, patience=10, batch=32, device=[0,1]) |
|
``` |
|
|
|
## Evaluation results |
|
|
|
Evaluation results using the validation dataset are listed below: |
|
|Class|Images|Class instances|Box precision|Box recall|Box mAP50|Box mAP50-95|Mask precision|Mask recall|Mask mAP50|Mask mAP50-95 |
|
|:----|:----|:----|:----|:----|:----|:----|:----|:----|:----|:----| |
|
All|102|563|0.909|0.918|0.94|0.648|0.889|0.892|0.896|0.567 |
|
Paragraph|102|197|0.957|0.99|0.985|0.966|0.957|0.99|0.985|0.952 |
|
Marginalia|102|48|0.887|0.917|0.922|0.664|0.888|0.917|0.927|0.669 |
|
Page number|102|98|0.884|0.796|0.87|0.421|0.851|0.758|0.808|0.32 |
|
Case start|102|220|0.91|0.968|0.984|0.541|0.858|0.905|0.864|0.328 |
|
|
|
More information on the performance metrics can be found [here](https://docs.ultralytics.com/guides/yolo-performance-metrics/). |
|
|
|
## Inference |
|
|
|
If the model file `tuomiokirja_regions_04122023.pt` is downloaded to a folder `\models\tuomiokirja_regions_04122023.pt` |
|
and the input image path is `\data\image.jpg', inference can be perfomed using the following code: |
|
|
|
```python |
|
from ultralytics import YOLO |
|
|
|
# Initialize model |
|
model = YOLO('\models\tuomiokirja_regions_04122023.pt') |
|
prediction_results = model.predict(source='\data\image.jpg', save=True) |
|
``` |
|
More information for available inference arguments can be found [here](https://docs.ultralytics.com/modes/predict/#inference-arguments). |