miguelcarv
/

resnet-152-text-detector

Image Classification

Inference Endpoints

Model card Files Files and versions Community

resnet-152-text-detector / README.md

miguelcarv's picture

Update README.md

2a9d146 verified 8 months ago

|

history blame contribute delete

No virus

1.3 kB

	# Model Card for ResNet-152 Text Detector
	This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~140k images, where 50% of them had text and 50% of them had no legible text.

	# Model Details
	## How to Get Started with the Model
	```python
	from PIL import Image
	import requests
	import torch
	from transformers import AutoImageProcessor, AutoModelForImageClassification

	model = AutoModelForImageClassification.from_pretrained(
	"miguelcarv/resnet-152-text-detector",
	)

	processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)

	url = "http://images.cocodataset.org/train2017/000000044520.jpg"
	image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((300,300))

	inputs = processor(image, return_tensors="pt").pixel_values

	with torch.no_grad():
	outputs = model(inputs)

	logits_per_image = outputs.logits
	probs = logits_per_image.softmax(dim=1)
	print(probs)
	# tensor([[0.1085, 0.8915]])
	```
	# Training Details
	- Trained for three epochs
	- Resolution: 300x300
	- Learning rate: 5e-5
	- Optimizer: AdamW
	- Batch size: 64
	- Trained with FP32