NFTCID

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Goku: Flow Based Video Generative Foundation Models

liked a dataset 2 days ago

saiyan-world/Goku-MovieGenBench

reacted to ginipick's post with 🔥 4 days ago

🌟 3D Llama Studio - AI 3D Generation Platform 📝 Project Overview 3D Llama Studio is an all-in-one AI platform that generates high-quality 3D models and stylized images from text or image inputs. ✨ Key Features Text/Image to 3D Conversion 🎯 Generate 3D models from detailed text descriptions or reference images Intuitive user interface Text to Styled Image Generation 🎨 Customizable image generation settings Adjustable resolution, generation steps, and guidance scale Supports both English and Korean prompts 🛠️ Technical Features Gradio-based web interface Dark theme UI/UX Real-time image generation and 3D modeling 💫 Highlights User-friendly interface Real-time preview Random seed generation High-resolution output support (up to 2048x2048) 🎯 Applications Product design Game asset creation Architectural visualization Educational 3D content 🔗 Try It Now! Experience 3D Llama Studio: https://huggingface.co/spaces/ginigen/3D-LLAMA #AI #3DGeneration #MachineLearning #ComputerVision #DeepLearning

View all activity

Organizations

None yet

NFTCID's activity

upvoted a paper 2 days ago

Goku: Flow Based Video Generative Foundation Models

Paper • 2502.04896 • Published 5 days ago • 65

liked a dataset 2 days ago

saiyan-world/Goku-MovieGenBench

Viewer • Updated 1 day ago • 1k • 17.2k • 103

reacted to ginipick's post with 🔥 4 days ago

Post

5010

🌟 3D Llama Studio - AI 3D Generation Platform

📝 Project Overview
3D Llama Studio is an all-in-one AI platform that generates high-quality 3D models and stylized images from text or image inputs.

✨ Key Features

Text/Image to 3D Conversion 🎯

Generate 3D models from detailed text descriptions or reference images
Intuitive user interface

Text to Styled Image Generation 🎨

Customizable image generation settings
Adjustable resolution, generation steps, and guidance scale
Supports both English and Korean prompts

🛠️ Technical Features

Gradio-based web interface
Dark theme UI/UX
Real-time image generation and 3D modeling

💫 Highlights

User-friendly interface
Real-time preview
Random seed generation
High-resolution output support (up to 2048x2048)

🎯 Applications

Product design
Game asset creation
Architectural visualization
Educational 3D content

🔗 Try It Now!
Experience 3D Llama Studio:

ginigen/3D-LLAMA

#AI #3DGeneration #MachineLearning #ComputerVision #DeepLearning

reacted to Xenova's post with 👍 4 days ago

Post

5314

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

7 replies

liked a model 16 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 4 days ago • 3.21M • • 8.51k

reacted to m-ric's post with 👀 26 days ago

Post

1350

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

🏗️ MoE with novel hybrid attention:
‣ Mixture of Experts with 456B total parameters (45.9B activated per token)
‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

🏆 Outperforms leading models across benchmarks while offering vastly longer context:
‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

🔬 Technical innovations enable efficient scaling:
‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half
‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

🎯 Thorough training strategy:
‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! 📝
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here 👉 MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users 👉 MiniMaxAI/MiniMax-Text-01