article.md · strickvl/redaction-detector at 2f2050d90feb717d36f6de66dad8ffed24c8186e

I've been working through the first two lessons of the fastai course. For lesson one I trained a model to recognise my cat, Mr Blupus. For lesson two the emphasis is on getting those models out in the world as some kind of demo or application. Gradio and Huggingface Spaces makes it super easy to get a prototype of your model on the internet.

This model has an accuracy of ~96% on the validation dataset.

The Dataset

I downloaded a few thousand publicly-available FOIA documents from a government website. I split the PDFs up into individual .jpg files and then used Prodigy to annotate the data. (This process was described in a blogpost written last year.)

Training the model

I trained the model with fastai's flexible vision_learner, fine-tuning resnet18 which was both smaller than resnet34 (no surprises there) and less liable to early overfitting. I trained the model for 10 epochs.

Spaces:

strickvl
/

redaction-detector

Runtime error

The Dataset

Training the model

Further Reading