🪄 MagicQuill: AI that reads your mind for image edits! Point at what bugs you, and it suggests the perfect fixes. No more manual editing headaches. Try it here: AI4Editing/MagicQuill
Models 💻 Coding: Qwen team released two Qwen2.5-Coder checkpoints of 32B and 7B. Infly released OpenCoder: 1.5B and 8B coding models with instruction SFT'd versions and their datasets! 💗
🖼️ Image/Video Gen: Alibaba vision lab released In-context LoRA -- 10 LoRA models on different themes based on Flux. Also Mochi the sota video generation model with A2.0 license now comes natively supported in diffusers 👏
🖼️ VLMs/Multimodal: NexaAIDev released Omnivision 968M a new vision language model aligned with DPO for reducing hallucinations, also comes with GGUF ckpts 👏 Microsoft released LLM2CLIP, a new CLIP-like model with longer context window allowing complex text inputs and better search
🎮 AGI?: Etched released Oasis 500M, a diffusion based open world model that takes keyboard input and outputs gameplay 🤯
Datasets Common Corpus: A text dataset with 2T tokens with permissive license for EN/FR on various sources: code, science, finance, culture 📖
Been reading about the "bigger models = better AI" narrative getting pushed back today.
@thomwolf tackled this head on at Web Summit and highlighted how important small models are (and why closed-source companies haven't pushed for this 😬). They're crushing it: today's 1B parameter models outperform last year's 10B models.
Fascinating to hear him talk about the secret sauce behind this approach.
Fascinating point from @thomwolf at Web Summit: AI misuse (deepfakes, fake news) is actually easier to make with closed models, not with open-source ones.
This challenges the common narrative that open-source AI is inherently more dangerous. The reality is more nuanced - while we may think open source is technically easier to misuse, closed models' accessibility and product-focused design appear to be driving more actual harm.
Important context for current AI safety discussions and regulation debates.
🤯 AI progress keeps blowing my mind! Just experienced Qwen's new Coder demo - built a complete flashcard web app with a single prompt. The results are incredible!
This demo is part of the new Qwen2.5 Coder family (0.5B to 32B models), surpassing/matching GPT4o and Claude Sonnet 3.5 across multiple coding benchmarks.
- 128K context window for 14B/32B models - Drop-in replacement for GPT-4 in Cursor & Artifacts - Models on the Hub under Apache 2.0 license
Just tested Argilla's new data annotation feature - it's a game changer for AI project quality.
Upload CSVs, work with published datasets, or improve existing ones directly on HuggingFace Hub. Setup took < 2 minutes, no code needed (see example below where I selected a dataset to classify tweets in categories).
Real world impact: Missing in Chicago won a Pulitzer using a similar approach - 200 volunteers labeled police misconduct files to train their model. That's the power of good data annotation.
Three immediate use cases I see: - Build collaborative training sets with your community (surprisingly underused in AI journalism) - Turn your website chatbot logs into high-quality fine-tuning data - Compare generated vs published content (great for SEO headlines)
Works for solo projects or teams up to 100 people. All integrated with HuggingFace Hub for immediate model training.
Interesting to see tools like this making data quality more accessible. Data quality is the hidden driver of AI success that we don't talk about enough.
⚡ Mixture of Experts (MoE) architecture: 389 B parameters in total, but only 52B are activated for any input
🧪 Trained on 7T tokens, including 1.5T tokens of synthetic data
🏗️ Architecture : Novel "recycle routing" prevents token dropping when experts are overrloaded
📊 Great benchmark results: Surpasses Llama-3-405B-Instruct in most benchmarks although it has 8x fewer active parameters ‣ Impressive perf on MATH: 77.4
🐋 Large context length: up to 256K tokens
🔒 License: ‣ Commercial use allowed, except if your products have >100M monthly active users ‣ No access in the EU
🎙️ "We need digital sobriety." @sasha challenges Big Tech's race for nuclear energy on BBC AI Decoded. Instead of pursuing more power, shouldn't we first ask if we really need AI everywhere?
First AI Journalism Lab cohort just wrapped - endless inspiration for newsrooms: - Ludwig Siegele built an AI style checker for The Economist - Rodney Gibbs created a tool helping small newsrooms analyze stories through user needs - Monsur Hussain developed AI trend monitoring system for fact-checking WhatsApp claims - David Cohn built a system for analyzing audience engagement - Clare Spencer crafted video personas with AI
The insights on adoption during the discussion were fascinating - their approach really resonated with me. Instead of forcing AI tools onto teams, they emphasized getting skeptics involved early in testing and creating safe spaces for open discussion. Start small with enthusiastic participants, build a community of internal AI champions, and focus on solving specific problems rather than pushing for adoption.
As a coach, I also learned a lot. My 5 key takeaways: - Newsrooms are bursting with AI x journalism innovation - Internal alignment > technical challenges. Strong dev/PM relationships = magic - Early prototyping + user involvement = better adoption. Set realistic expectations & embrace feedback - Cross-newsroom collaboration supercharges innovation - Great products can emerge in weeks with proper scoping
🔍 NYT leveraged AI to investigate election interference by analyzing 400+ hours of recorded meetings - that's 5M words of data!
AI spotted patterns, humans verified facts. Every AI-flagged quote was manually verified against source recordings. Really appreciate that they published their full methodology - transparency matters when using AI in journalism.
A perfect blend of tech & journalism.
The future of journalism isn't robots replacing reporters - it's AI helping humans process massive datasets more efficiently. Sometimes the most powerful tech solutions are the least flashy ones.
Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?
This is no Woodstock AI but will be fun nonetheless haha. I’ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
🤯 Plot twist: Size isn't everything in AI! A lean 32B parameter model just showed up to the party and outperformed a 70B one. Efficiency > Scale? The AI world just got more interesting...
Cohere For AI released Aya Expanse, a new family of multilingual models (8B and 32B) spanning 23 popular languages.
Just watched @thomwolf tear down the over-hyped AGI narrative in 30 seconds - and it's refreshingly grounded.
No wild speculation about superintelligence timelines or consciousness. Just practical insights from someone who really understands the technology.
This is the kind of level-headed perspective that helps us focus on what AI can actually do today (which is already transformative) rather than getting lost in AGI fantasy. Worth your time if you want to understand AI progress without the hype.
New York Times to Perplexity: Stop Using Our Stuff
The publisher has sent generative-AI startup Perplexity a “cease and desist” notice demanding that the firm stop accessing and using its content, according to a copy of the letter reviewed by The Wall Street Journal.
Perplexity CEO Aravind Srinivas said in an interview that Perplexity isn’t ignoring the Times’s efforts to block crawling of its site. He said the company plans on responding to the legal notice by the Times’s deadline of Oct. 30.
“We are very much interested in working with every single publisher, including the New York Times,” Srinivas said. “We have no interest in being anyone’s antagonist here.”