Dataset highlights: - 644,412 public domain images with comprehensive metadata from publicdomainpictures.net - English language metadata including titles, descriptions, and keywords - Each entry contains rich metadata including: - Unique image ID and full-size image URLs - Detailed titles and descriptions - Keyword/tag collections - Creator attribution - Released to the public domain under Creative Commons Zero (CC0) license
I wanted to introduce myself and my company @Overlaiapp. We are a collective of filmmakers, photographers, and AI engineers working on high resolution (8K+) training data.
We plan to share a lot of our datasets with the community and are kicking things off with two curated datasets:
π₯ Oversampled: Every clip is captured in stunning 8K resolution, delivering rich detail ideal for fine tuning scenic landscapes and ocean dynamics.
πΈ Variance: Includes close-up details, slow-motion footage of crashing waves, sweeping landscapes, and wildlife shots.
π Detailed Metadata: Every clip is paired with structured metadata, including creative descriptions, precise camera movements, lens information, field of view calculations, and shot settings, ensuring AI models can fully understand and replicate real-world cinematography with accuracy.
βοΈ Consistency: Re-thinking training data at the point of capture by "overshooting" a subject, enabling models to learn more nuanced relationships and views across scenes.
π Light: Shot during early morning and sunset light for optimal color contrast and dynamic range, maximizing visual quality for color and lighting-sensitive tasks.
π Curation: Curated specifically for machine learning, providing clean, high-quality data for next generation model training.
Lotus πͺ· is a new foundation model on monocular depth estimation β¨ Compared to previous diffusion-based MDE models, Lotus is modified for dense prediction tasks Authors also released a model for normal prediction π€ Find everything in this collection merve/lotus-6718fb957dc1c85a47ca1210
If you have ~300+ GB of V-RAM, you can run Mochi from @genmo
A SOTA model that dramatically closes the gap between closed and open video generation models.
Mochi 1 introduces revolutionary architecture featuring joint reasoning over 44,520 video tokens with full 3D attention. The model implements extended learnable rotary positional embeddings (RoPE) in three dimensions, with network-learned mixing frequencies for space and time axes.
The model incorporates cutting-edge improvements, including: - SwiGLU feedforward layers - Query-key normalization for enhanced stability - Sandwich normalization for controlled internal activations
What is currently available? The base model delivers impressive 480p video generation with exceptional motion quality and prompt adherence. Released under the Apache 2.0 license, it's freely available for both personal and commercial applications.
What's Coming? Genmo has announced Mochi 1 HD, scheduled for release later this year, which will feature: - Enhanced 720p resolution - Improved motion fidelity - Better handling of complex scene warping
π€― Plot twist: Size isn't everything in AI! A lean 32B parameter model just showed up to the party and outperformed a 70B one. Efficiency > Scale? The AI world just got more interesting...
Cohere For AI released Aya Expanse, a new family of multilingual models (8B and 32B) spanning 23 popular languages.
Remember when @Google launched MediaPipe in an effort to create efficient on-device pipelines?
They've just unlocked the ability to run 7B+ parameter language models directly in your browser. This is a game-changer for on-device AI!
Yes, they are streaming 8.6 GB model files!
Currently, they have Gemma 2B/7B running, but imagine Dynamic LoRA, multimodal support, quantization, and you never leaving Chrome!
This is a significant technical advancement, especially in Memory Optimization:
- Redesigned the model-loading code to work around WebAssembly's 4 GB memory limit. - Implemented asynchronous loading of transformer stack layers (28 for Gemma 1.1 7B). - Reduced peak WebAssembly memory usage to less than 1% of previous requirements.
Cross-Platform Compatibility - Compiled the C++ codebase to WebAssembly for broad browser support. - Utilized the WebGPU API for native GPU acceleration in browsers.
Here's why this matters:
1. Privacy: No need to send data to remote servers. 2. Cost-Efficiency: Eliminates server expenses. 3. Offline Capabilities: Use powerful AI without an internet connection.
Hi everyone! I'm Alex, I'm 16, I've been an internship at Hugging Face for a little over a week and I've already learned a lot about using and prompting LLM models. With @victor as tutor I've just finished a space that analyzes your feelings by prompting an LLM chat model. The aim is to extend it so that it can categorize hugging face posts.
I am pleased to announce that I have founded the University of Glasgow organization on Huggingface. If you are affiliated with the University of Glasgow or have a relative who is, you can log in through the relevant link.
Wikimedia and Hugging Face seem kind of naturally complementary: Both are community-centred, value openness and consent. That's why I'd love to see more Wikipedia and other Wikimedia projects' datasets on Hugging Face to advance machine learning with diverse, community-curated data! See my new article on the Hugging Face hub for why and how to create more Wikimedia datasets on Hugging Face: https://huggingface.co/blog/frimelle/wikipedias-treasure-trove-ml-data
reacted to Salama1429's
post with π6 months ago
π Overview: The YouTube Commons Dataset is a comprehensive collection of 30 billion words from 15,112,121 original and automatically translated transcripts, drawn from 2,063,066 videos on YouTube.
π License: All videos are shared under the CC-BY license, with the majority (71%) in English.
π€ Applications: This dataset is ideal for training powerful AI models for converting speech to text (ASR) and translation models.
π Utilization: The text can be used for model training and is republishable for reproducibility purposes.
π€ Collaboration: This dataset is the result of a collaboration between state start-up LANGU:IA, the French Ministry of Culture, and DINUM. It will be expanded in the coming months.