Dokyoon

leeloolee

AI & ML interests

ai

Recent Activity

upvoted a paper 9 days ago
GUI Agents: A Survey
reacted to m-ric's post with ๐Ÿ‘ 15 days ago
๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐๐ข๐œ๐จ๐ญ๐ซ๐จ๐ง, ๐š ๐ฆ๐ข๐œ๐ซ๐จ๐ฌ๐œ๐จ๐ฉ๐ข๐œ ๐ฅ๐ข๐› ๐ญ๐ก๐š๐ญ ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Ÿ’๐ƒ ๐ฉ๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐Ÿฅณ ๐Ÿ•ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. ๐Ÿ‘ด๐Ÿป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " ๐Ÿ› ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. ๐Ÿค ๐—•๐˜‚๐˜ ๐—ป๐—ผ๐˜„ ๐˜„๐—ฒ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ป๐—ฒ๐—ฒ๐—ฑ ๐—ต๐˜‚๐—ด๐—ฒ ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐˜€ ๐—ฎ๐—ป๐˜†๐—บ๐—ผ๐—ฟ๐—ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! โšก ๐—œ๐˜'๐˜€ ๐˜๐—ถ๐—ป๐˜†, ๐˜†๐—ฒ๐˜ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look ๐Ÿ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
View all activity

Organizations

sionic-ai's profile picture Multi๐Ÿค–Transformers's profile picture ์ธ์ŠคํŠธ๋ŸญํŠธ.ํ•œ๊ตญ's profile picture AI Safeguard's profile picture

leeloolee's activity

reacted to m-ric's post with ๐Ÿ‘ 15 days ago
view post
Post
2259
๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐๐ข๐œ๐จ๐ญ๐ซ๐จ๐ง, ๐š ๐ฆ๐ข๐œ๐ซ๐จ๐ฌ๐œ๐จ๐ฉ๐ข๐œ ๐ฅ๐ข๐› ๐ญ๐ก๐š๐ญ ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Ÿ’๐ƒ ๐ฉ๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐Ÿฅณ

๐Ÿ•ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

๐Ÿ‘ด๐Ÿป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

๐Ÿ› ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

๐Ÿค ๐—•๐˜‚๐˜ ๐—ป๐—ผ๐˜„ ๐˜„๐—ฒ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ป๐—ฒ๐—ฒ๐—ฑ ๐—ต๐˜‚๐—ด๐—ฒ ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐˜€ ๐—ฎ๐—ป๐˜†๐—บ๐—ผ๐—ฟ๐—ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

โšก ๐—œ๐˜'๐˜€ ๐˜๐—ถ๐—ป๐˜†, ๐˜†๐—ฒ๐˜ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look ๐Ÿ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
  • 1 reply
ยท
reacted to alimotahharynia's post with ๐Ÿ”ฅ 15 days ago
view post
Post
1558
Here's the space for our new article that leverages LLMs with reinforcement learning to design high-quality small molecules. Check it out at alimotahharynia/GPT-2-Drug-Generator. You can also access the article here: https://arxiv.org/abs/2411.14157.
I would be happy to receive your feedback.
reacted to cutechicken's post with โค๏ธ 15 days ago
view post
Post
2851
๐Ÿš€ RAGOndevice: High-Performance Local AI Document Analysis Assistant
๐Ÿ’ซ Core Value
RAGOndevice is a high-performance AI system running locally without cloud dependency. Using CohereForAI's optimized 7B model, it enables professional-grade document analysis on standard PCs. โœจ
๐ŸŒŸ Ondevice AI Advantages
1. ๐Ÿ”‹ Efficient Resource Utilization

๐ŸŽฏ Optimized 7B Model: Runs on standard PCs
โšก Local Processing: Instant response without cloud
๐Ÿ’ป Low-Spec Compatible: Performs well on regular GPUs
๐Ÿ”„ Optimized Memory: Ensures stable operation

2. ๐Ÿ›ก๏ธ Data Security & Cost Efficiency

๐Ÿ”’ Complete Privacy: No external data transmission
๐ŸŒ Offline Operation: No internet required
๐Ÿ’ฐ No Subscription: One-time installation
โš™๏ธ Resource Optimization: Uses existing hardware

๐ŸŽฎ Key Features
1. ๐Ÿ“Š Powerful Document Analysis

๐Ÿ“ Multi-Format Support: TXT, CSV, PDF, Parquet
๐Ÿง  Intelligent Analysis: Automatic structure recognition
๐Ÿ‘๏ธ OCR Support: Advanced PDF text extraction
๐Ÿ’ฌ Real-time Chat: Natural language interaction

2. ๐Ÿ” Local RAG System

๐ŸŽฏ Efficient Search: TF-IDF based local search
๐Ÿงฉ Context Understanding: Accurate information retrieval
๐Ÿ“š Wikipedia Integration: Rich background knowledge

๐ŸŽฏ Use Cases

๐Ÿข Enterprise: Secure confidential document processing
๐Ÿ”ฌ Personal Research: Private data analysis
๐Ÿ“š Education: Personal learning material analysis
๐Ÿ’ป Development: Local codebase analysis

โญ Differentiators

๐Ÿƒโ€โ™‚๏ธ Independent Operation: Zero cloud dependency
โšก Instant Response: No network latency
๐Ÿ” Complete Security: Full data control
๐Ÿ’Ž Cost Efficiency: No ongoing costs

๐Ÿ”ฎ Future Plans

๐Ÿš€ Enhanced model optimization
๐Ÿ“š Local knowledge base expansion
โšก Hardware optimization
๐Ÿ“ Extended file support


๐ŸŒŸ RAGOndevice democratizes high-performance AI, providing the optimal local AI solution for security-sensitive environments. ๐Ÿš€

๐Ÿ”ฅ Power of Local AI: Experience enterprise-grade AI capabilities right on your device!

VIDraft/RAGOndevice
reacted to julien-c's post with ๐Ÿ”ฅ 23 days ago
view post
Post
7896
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
upvoted an article about 1 month ago
view article
Article

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

By mikelabs โ€ข
โ€ข 11