2 98 53

Raja Biswas

rbiswasfc

AI & ML interests

NLP, Generative AI

Recent Activity

authored a paper 2 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

upvoted a paper 3 days ago

No More Adam: Learning Rate Scaling at Initialization is All You Need

upvoted a paper 3 days ago

Qwen2.5 Technical Report

View all activity

Articles

Finally, a Replacement for BERT: Introducing ModernBERT

4 days ago

• 309

Organizations

rbiswasfc's activity

upvoted 2 papers 3 days ago

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 6 days ago • 37

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 3 days ago • 290

upvoted a collection 3 days ago

ModernBERT

Collection

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 3 days ago • 78

upvoted a paper 3 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 4 days ago • 93

upvoted 3 papers 7 days ago

upvoted 9 papers about 1 month ago

Can Knowledge Editing Really Correct Hallucinations?

Paper • 2410.16251 • Published Oct 21 • 54

CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

Paper • 2410.23090 • Published Oct 30 • 54

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31 • 59

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7 • 111

Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published Nov 12 • 62

CamemBERT 2.0: A Smarter French Language Model Aged to Perfection

Paper • 2411.08868 • Published Nov 13 • 12

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Paper • 2411.07133 • Published Nov 11 • 34

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6 • 30

LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Paper • 2411.04997 • Published Nov 7 • 37

upvoted a collection about 1 month ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 25 days ago • 437

upvoted a paper 2 months ago

AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21 • 58

upvoted a collection 2 months ago

Parakeet

Collection

NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 8 items • Updated Oct 1 • 20

upvoted a paper 3 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 138