Running
80
📄➡️🤗
Spaces and utilities for creating datasets and getting them on the Hub
Note This Space extracts embeeded text from PDFs and pushes the resulting text to a Hugging Face Hub dataset
Note This Spaces will convert a PDF(s) to a set of images per page and optionally push the images to a Hugging Face Dataset. Can be useful to help generate an initial dataset for annotation or further processing.
Note Corpus Creator is a tool for transforming a collection of text files into a Hugging Face dataset, perfect for various natural language processing (NLP) tasks. Whether you're preparing data for synthetic generation, building pipelines, or setting up annotation tasks, this app simplifies the process.