2 6 1

Zhenyu Zhang

Kyriection

AI & ML interests

Large Language Models, Efficient Machine Learning, Quantum Computing

Recent Activity

authored a paper 11 days ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

authored a paper 11 days ago

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

upvoted a paper 12 days ago

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

View all activity

Organizations

None yet

Kyriection's activity

authored 2 papers 11 days ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published 26 days ago • 9

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Paper • 2502.17055 • Published 13 days ago • 16

upvoted a paper 12 days ago

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Paper • 2502.17055 • Published 13 days ago • 16

upvoted a paper 25 days ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published 26 days ago • 9

upvoted a paper 3 months ago

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Paper • 2412.13795 • Published Dec 18, 2024 • 19

authored a paper 3 months ago

APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

upvoted a paper 3 months ago

APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

authored 2 papers 8 months ago

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

Paper • 2303.01610 • Published Mar 2, 2023

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

Paper • 2407.11239 • Published Jul 15, 2024 • 8

upvoted a paper 8 months ago

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11, 2024 • 33

authored a paper 8 months ago

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11, 2024 • 33

commented a paper 8 months ago

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11, 2024 • 33 •

authored 2 papers 10 months ago

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

Paper • 2403.04797 • Published Mar 5, 2024 • 1

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Paper • 2310.00535 • Published Oct 1, 2023 • 2

authored 5 papers about 1 year ago

Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?

Paper • 2302.12480 • Published Feb 24, 2023

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership

Paper • 2111.00162 • Published Oct 30, 2021 • 1

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

Paper • 2310.01334 • Published Oct 2, 2023 • 3

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Paper • 2402.09398 • Published Feb 14, 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 185

updated a model about 1 year ago

Phando/chemberta-mlp2x-contrastive

Updated Jan 6, 2024 • 149