1 51 7

Swasti Sweker

Swekerr

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago

Putting RL back in RLHF

upvoted a paper 21 days ago

Qwen2.5-VL Technical Report

upvoted an article 26 days ago

Introducing smolagents: simple agents that write actions in code.

View all activity

Organizations

Swekerr's activity

upvoted an article 3 days ago

Article

Putting RL back in RLHF

Jun 12, 2024

• 84

upvoted a paper 21 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 23 days ago • 164

upvoted an article 26 days ago

Article

Introducing smolagents: simple agents that write actions in code.

Dec 31, 2024

• 869

upvoted a paper 29 days ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published about 1 month ago • 47

upvoted 3 articles about 1 month ago

Article

Zero to Hero with the TRL learning link bomb 💣

•

Nov 25, 2024

• 5

Article

Janus Pro: DeepSeek's Revolutionary Multimodal AI Model

•

Jan 28

• 31

Article

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

•

Aug 25, 2023

• 28

upvoted 2 articles about 2 months ago

Article

Mastering Long Contexts in LLMs with KVPress

and 1 other •

Jan 23

• 64

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 154

upvoted 2 papers about 2 months ago

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 63

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 23

upvoted 4 articles about 2 months ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

•

May 7, 2024

• 60

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 197

Article

Mixture of Experts Explained

Dec 11, 2023

• 453

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

• 193

upvoted 2 papers about 2 months ago

PokerBench: Training Large Language Models to become Professional Poker Players

Paper • 2501.08328 • Published Jan 14 • 17

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 276

upvoted an article about 2 months ago

Article

Mastering Tensor Dimensions in Transformers

•

Jan 12

• 49

upvoted 2 papers about 2 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 92

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84