Barbara

Babl21

AI & ML interests

None yet

Recent Activity

upvoted an article 17 days ago

They Said It Couldn’t Be Done

upvoted an article about 1 month ago

SauerkrautLM's Multi-Phase Spectrum Training: A Technical Deep Dive

upvoted a paper about 2 months ago

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

View all activity

Organizations

None yet

Babl21's activity

upvoted an article 17 days ago

Article

They Said It Couldn’t Be Done

•

18 days ago

• 75

upvoted an article about 1 month ago

Article

SauerkrautLM's Multi-Phase Spectrum Training: A Technical Deep Dive

•

Nov 9

• 9

upvoted a paper about 2 months ago

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31 • 59

upvoted a paper 4 months ago

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Paper • 2408.13467 • Published Aug 24 • 24

upvoted 2 papers 5 months ago

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Paper • 2407.16154 • Published Jul 23 • 21

Longhorn: State Space Models are Amortized Online Learners

Paper • 2407.14207 • Published Jul 19 • 17

upvoted 2 papers 6 months ago

LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

Paper • 2407.05975 • Published Jul 8 • 34

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 95

upvoted 9 papers 7 months ago

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3 • 43

Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning

Paper • 2406.00392 • Published Jun 1 • 12

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30 • 29

Xwin-LM: Strong and Scalable Alignment Practice for LLMs

Paper • 2405.20335 • Published May 30 • 17

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published May 23 • 37

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23 • 36

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16 • 26

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

upvoted a paper 8 months ago

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27

upvoted 2 papers 9 months ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 104

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9 • 34