2 23 76

neuralink

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

upvoted a paper 17 days ago

Scaling Laws for Floating Point Quantization Training

liked a Space about 1 month ago

sail/scaling-with-vocab-demo

View all activity

Articles

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14, 2024

• 57

Organizations

neuralink's activity

upvoted a paper 11 days ago

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Paper • 2409.15241 • Published Sep 23, 2024 • 1

upvoted a paper 17 days ago

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published 21 days ago • 25

liked 2 Spaces about 1 month ago

Running

📊

tencent/Tencent-Hunyuan-Large

Text Generation • Updated 7 days ago • 357 • 557

upvoted a paper 2 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 255

reacted to ArthurZ's post with 🔥 2 months ago

Post

2993

Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! 🔥

liked a model 4 months ago

meta-llama/Llama-3.2-11B-Vision

Image-Text-to-Text • Updated Sep 27, 2024 • 84.5k • 442

updated 2 models 4 months ago

nanotron/temp_for_pr_review

Updated Sep 24, 2024

nanotron/fp8_for_nanotron

Updated Sep 21, 2024

upvoted a paper 5 months ago

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 20

upvoted an article 5 months ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11, 2024

• 111

upvoted a paper 5 months ago

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Paper • 2201.02177 • Published Jan 6, 2022 • 2

upvoted an article 5 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14, 2024

• 57

upvoted a paper 5 months ago

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Paper • 2405.20233 • Published May 30, 2024 • 6

upvoted a paper 6 months ago

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 156

updated a model 6 months ago

nanotron/llama3-8b-infini-attention

Updated Aug 5, 2024 • 7 • 3

New activity in huggingface/documentation-images 6 months ago

infini-attention

#352 opened 6 months ago by

neuralink

liked a model 6 months ago

nanotron/llama3-8b-infini-attention

Updated Aug 5, 2024 • 7 • 3

liked a dataset 6 months ago

huggingface/documentation-images

Viewer • Updated 1 day ago • 50 • 2.56M • 47

neuralink

AI & ML interests

Recent Activity

Articles

A failed experiment: Infini-Attention, and why we should keep trying?

Organizations

neuralink's activity

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Scaling Laws for Floating Point Quantization Training

Scaling With Vocab Demo

Harm Space

tencent/Tencent-Hunyuan-Large

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

meta-llama/Llama-3.2-11B-Vision

nanotron/temp_for_pr_review

nanotron/fp8_for_nanotron

Small-scale proxies for large-scale Transformer training instabilities

How NuminaMath Won the 1st AIMO Progress Prize

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

A failed experiment: Infini-Attention, and why we should keep trying?

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Transformer Explainer: Interactive Learning of Text-Generative Models

nanotron/llama3-8b-infini-attention

infini-attention

nanotron/llama3-8b-infini-attention

huggingface/documentation-images