NFTCID's picture
1 10

NFTCID

NFTCID
Ā·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

NFTCID's activity

reacted to ginipick's post with šŸ”„ 4 days ago
view post
Post
5010
šŸŒŸ 3D Llama Studio - AI 3D Generation Platform

šŸ“ Project Overview
3D Llama Studio is an all-in-one AI platform that generates high-quality 3D models and stylized images from text or image inputs.

āœØ Key Features

Text/Image to 3D Conversion šŸŽÆ

Generate 3D models from detailed text descriptions or reference images
Intuitive user interface

Text to Styled Image Generation šŸŽØ

Customizable image generation settings
Adjustable resolution, generation steps, and guidance scale
Supports both English and Korean prompts

šŸ› ļø Technical Features

Gradio-based web interface
Dark theme UI/UX
Real-time image generation and 3D modeling

šŸ’« Highlights

User-friendly interface
Real-time preview
Random seed generation
High-resolution output support (up to 2048x2048)

šŸŽÆ Applications

Product design
Game asset creation
Architectural visualization
Educational 3D content

šŸ”— Try It Now!
Experience 3D Llama Studio:

ginigen/3D-LLAMA

#AI #3DGeneration #MachineLearning #ComputerVision #DeepLearning
reacted to Xenova's post with šŸ‘ 4 days ago
view post
Post
5314
We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. āš”ļø

Generate 10 seconds of speech in ~1 second for $0.

What will you build? šŸ”„
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
āœ‚ļø Implement sentence splitting, allowing for streamed responses
šŸŒ Multilingual support (only phonemization left)

Who wants to help?
Ā·
reacted to m-ric's post with šŸ‘€ 26 days ago
view post
Post
1350
š— š—¶š—»š—¶š— š—®š˜…'š˜€ š—»š—²š˜„ š— š—¼š—˜ š—Ÿš—Ÿš—  š—暝—²š—®š—°š—µš—²š˜€ š—–š—¹š—®š˜‚š—±š—²-š—¦š—¼š—»š—»š—²š˜ š—¹š—²š˜ƒš—²š—¹ š˜„š—¶š˜š—µ šŸ°š—  š˜š—¼š—øš—²š—»š˜€ š—°š—¼š—»š˜š—²š˜…š˜ š—¹š—²š—»š—“š˜š—µ šŸ’„

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

š—žš—²š˜† š—¶š—»š˜€š—¶š—“š—µš˜š˜€:

šŸ—ļø MoE with novel hybrid attention:
ā€£ Mixture of Experts with 456B total parameters (45.9B activated per token)
ā€£ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

šŸ† Outperforms leading models across benchmarks while offering vastly longer context:
ā€£ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
ā€£ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

šŸ”¬ Technical innovations enable efficient scaling:
ā€£ Novel expert parallel and tensor parallel strategies cut communication overhead in half
ā€£ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

šŸŽÆ Thorough training strategy:
ā€£ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! šŸ“
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here šŸ‘‰ MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users šŸ‘‰ MiniMaxAI/MiniMax-Text-01