2 27 31

Edd

Erland

AI & ML interests

None yet

Recent Activity

liked a model 3 days ago

deepseek-ai/DeepSeek-V3-Base

updated a model 4 days ago

Erland/Llama-3.2-3B-JAX

updated a model 4 days ago

Erland/Llama-3.2-1B-JAX

View all activity

Organizations

None yet

Erland's activity

upvoted 8 collections 27 days ago

upvoted a collection 3 months ago

LLM Reasoning Papers

Collection

Papers to improve reasoning capabilities of LLMs • 17 items • Updated 9 days ago • 91

upvoted 2 papers 4 months ago

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published Sep 11 • 19

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5 • 88

upvoted an article 5 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14

• 54

upvoted an article 7 months ago

Article

Indexify: Bringing HuggingFace Models to Real-Time Pipelines for Production Applications

•

May 31

• 7

upvoted a collection 7 months ago

Blackhole

Collection

A black hole with lots of high-quality dialogue datasets in many fields, and multilingual helps to train LLMs with SFT and DPO methods easier. • 32 items • Updated Aug 18 • 6

upvoted a paper 8 months ago

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118

upvoted a paper 9 months ago

Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29 • 34

upvoted 2 papers 10 months ago

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14 • 20

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14 • 7

upvoted a collection 11 months ago

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 218

upvoted a paper about 1 year ago

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 32