๐ Glif App's Remixes feature allows you to slap a logo onto anything, seamlessly integrating the input image (logo) into various contexts. The result is stunning remixes that blend the input logo with generated images (img2img logo mapping) for incredible outcomes.
Good folks at @nvidia and @Tsinghua_Uni have released LLAMA-MESH - A Revolutionary Approach to 3D Content Generation!
This innovative framework enables the direct generation of 3D meshes from natural language prompts while maintaining strong language capabilities.
Here is the Architecture & Implementation!
>> Core Components
Model Foundation - If you haven't guessed it yet, it's built on the LLaMA-3.1-8B-Instruct base model - Maintains original language capabilities while adding 3D generation - Context length is set to 8,000 tokens
3D Representation Strategy - Uses the OBJ file format for mesh representation - Quantizes vertex coordinates into 64 discrete bins per axis - Sorts vertices by z-y-x coordinates, from lowest to highest - Sorts faces by the lowest vertex indices for consistency
Data Processing Pipeline - Filters meshes to a maximum of 500 faces for computational efficiency - Applies random rotations (0ยฐ, 90ยฐ, 180ยฐ, 270ยฐ) for data augmentation - Generates ~125k mesh variations from 31k base meshes - Uses Cap3D-generated captions for text descriptions
>> Training Framework
Dataset Composition - 40% Mesh Generation tasks - 20% Mesh Understanding tasks - 40% General Conversation (UltraChat dataset) - 8x training turns for generation, 4x for understanding
Training Configuration - Deployed on 32 A100 GPUs (for Nvidia, this is literally in-house) - 21,000 training iterations - Global batch size: 128 - AdamW optimizer with a 1e-5 learning rate - 30-step warmup with cosine scheduling - Total training time: approximately 3 days (based on the paper)
This research opens exciting possibilities for intuitive 3D content creation through natural language interaction. The future of digital design is conversational!
Mochi 1 from Genmo is the newest state-of-the-art Open Source video generation model that you can use for free on your computer. This model is a breakthrough like the very first Stable Diffusion model but this time it is starting for the video generation models. In this tutorial, I am going to show you how to use Genmo Mochi 1 video generation model on your computer, on windows, locally with the most advanced and very easy to use SwarmUI. SwarmUI as fast as ComfyUI but also as easy as using Automatic1111 Stable Diffusion web UI. Moreover, if you donโt have a powerful GPU to run this model locally, I am going to show you how to use this model on the best cloud providers RunPod and Massed Compute.
Amazing Ultra Important Tutorials with Chapters and Manually Written Subtitles / Captions Stable Diffusion 3.5 Large How To Use Tutorial With Best Configuration and Comparison With FLUX DEV : https://youtu.be/-zOKhoO9a5s
FLUX Full Fine-Tuning / DreamBooth Tutorial That Shows A Lot Info Regarding SwarmUI Latest : https://youtu.be/FvpWy1x5etM