Thomas Wolf PRO

thomwolf

AI & ML interests

NLP and open-source :-)

Recent Activity

Articles

Organizations

thomwolf's activity

posted an update about 20 hours ago
liked a Space 3 days ago
replied to nyuuzyou's post 6 days ago
Reacted to nyuuzyou's post with πŸ”₯ 6 days ago
view post
Post
927
πŸ–ΌοΈ Introducing Public Domain Pictures Dataset - nyuuzyou/publicdomainpictures

Dataset highlights:
- 644,412 public domain images with comprehensive metadata from publicdomainpictures.net
- English language metadata including titles, descriptions, and keywords
- Each entry contains rich metadata including:
- Unique image ID and full-size image URLs
- Detailed titles and descriptions
- Keyword/tag collections
- Creator attribution
- Released to the public domain under Creative Commons Zero (CC0) license
  • 2 replies
Β·
posted an update 6 days ago
replied to sequelbox's post 6 days ago
Reacted to sequelbox's post with πŸ‘ 6 days ago
Reacted to LukeNeumann's post with πŸ‘πŸ”₯ 6 days ago
view post
Post
1844
Hello Hugging Face community!

I wanted to introduce myself and my company @Overlaiapp . We are a collective of filmmakers, photographers, and AI engineers working on high resolution (8K+) training data.

We plan to share a lot of our datasets with the community and are kicking things off with two curated datasets:

- Overlaiai/OregonCoastin4K

- Overlaiai/SubArcticPolarBear


Overlai.ai Dataset Features

πŸŽ₯ Oversampled: Every clip is captured in stunning 8K resolution, delivering rich detail ideal for fine tuning scenic landscapes and ocean dynamics.

πŸ“Έ Variance: Includes close-up details, slow-motion footage of crashing waves, sweeping landscapes, and wildlife shots.

πŸ“‹ Detailed Metadata: Every clip is paired with structured metadata, including creative descriptions, precise camera movements, lens information, field of view calculations, and shot settings, ensuring AI models can fully understand and replicate real-world cinematography with accuracy.

βš™οΈ Consistency: Re-thinking training data at the point of capture by "overshooting" a subject, enabling models to learn more nuanced relationships and views across scenes.

πŸŒ… Light: Shot during early morning and sunset light for optimal color contrast and dynamic range, maximizing visual quality for color and lighting-sensitive tasks.

πŸ” Curation: Curated specifically for machine learning, providing clean, high-quality data for next generation model training.
upvoted an article 6 days ago