It's not every day you see the No. 1 ranked paper of the day open-sourcing a very powerful image editing app!
Fascinating to see MagicQuill - a groundbreaking interactive image editing system that makes precise photo editing effortless through advanced AI!
The system's architecture features three sophisticated components:
1. Editing Processor: - Implements a dual-branch architecture integrated into a latent diffusion framework - Utilizes PiDiNet for edge map extraction and content-aware per-pixel inpainting - Features a specialized UNet architecture with zero-convolution layers for feature insertion - Employs denoising score matching for training the control branch - Processes both structural modifications via scribble guidance and color manipulation through downsampled color blocks - Maintains pixel-level control through VAE-based latent space operations
2. Painting Assistor: - Powered by a fine-tuned LLaVA multimodal LLM using Low-Rank Adaptation (LoRA) - Trained on a custom dataset derived from Densely Captioned Images (DCI) - Processes user brushstrokes through specialized Q&A tasks for add/subtract/color operations - Features bounding box coordinate normalization for precise stroke localization - Implements streamlined single-word/phrase outputs for real-time performance
3. Idea Collector: - Built as a modular ReactJS component library - Supports cross-platform deployment via HTTP protocols - Compatible with Gradio and ComfyUI frameworks - Features comprehensive layer management and parameter adjustment capabilities - Implements real-time canvas updates and preview generation
The system outperforms existing solutions like SmartEdit and BrushNet in edge alignment and color fidelity while maintaining seamless integration with popular AI frameworks.
What are your thoughts on AI-powered creative tools?
๐๏ธ Listen to the audio "Podcast" of every single Hugging Face Daily Papers.
Now, "AI Paper Reviewer" project can automatically generates audio podcasts on any papers published on arXiv, and this is integrated into the GitHub Action pipeline. I sounds pretty similar to hashtag#NotebookLM in my opinion.
This audio podcast is powered by Google technologies: 1) Google DeepMind Gemini 1.5 Flash model to generate scripts of a podcast, then 2) Google Cloud Vertex AI's Text to Speech model to synthesize the voice turning the scripts into the natural sounding voices (with latest addition of "Journey" voice style)
"AI Paper Reviewer" is also an open source project. Anyone can use it to build and own a personal blog on any papers of your interests. Hence, checkout the project repository below if you are interested in! : https://github.com/deep-diver/paper-reviewer
This project is going to support other models including open weights soon for both text-based content generation and voice synthesis for the podcast. The only reason I chose Gemini model is that it offers a "free-tier" which is enough to shape up this projects with non-realtime batch generations. I'm excited to see how others will use this tool to explore the world of AI research, hence feel free to share your feedback and suggestions!
Iโve published a new dataset to simplify model merging ๐ค
This dataset facilitates the search for compatible architectures for model merging with @arcee_aiโs mergekit, streamlining the automation of high-performance merge searches ๐
INTRODUCING Hugging Face AutoTrain Client ๐ฅ Fine-tuning models got even easier!!!! Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks ๐ค
To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.