3 18 17

Metal Whale

metalwhale

https://blog.metalwhale.dev/

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Byte Latent Transformer: Patches Scale Better Than Tokens

liked a model 7 days ago

tencent/HunyuanVideo

upvoted a collection 20 days ago

Molmo

View all activity

Organizations

None yet

metalwhale's activity

upvoted a paper 1 day ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 5 days ago • 49

upvoted a collection 20 days ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 20 days ago • 288

upvoted an article about 1 month ago

Article

Releasing the largest multilingual open pretraining dataset

•

Nov 13

• 98

upvoted a paper 2 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 167

upvoted a collection 3 months ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 20 days ago • 425

upvoted a paper 6 months ago

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Paper • 2406.07522 • Published Jun 11 • 37

upvoted a paper 7 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 253

upvoted 4 papers 11 months ago

VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18 • 38

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 59

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8 • 70

upvoted 3 papers about 1 year ago

upvoted 3 papers over 1 year ago

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 242

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Paper • 2307.02499 • Published Jul 4, 2023 • 15

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Paper • 2306.07954 • Published Jun 13, 2023 • 112