Sebastian Gabarain's picture

Sebastian Gabarain

Locutusque

·

SebastianG74019

AI & ML interests

Pushing performance in small language models

Recent Activity

reacted to nroggendorff's post with 😔 7 days ago

liked a model 7 days ago

answerdotai/ModernBERT-large

liked a model 8 days ago

microsoft/BiomedParse

View all activity

Organizations

Locutusque's activity

upvoted a paper about 1 month ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13 • 43

upvoted a paper 3 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47

upvoted 2 papers 6 months ago

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

Paper • 2407.08348 • Published Jul 11 • 50

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 34

upvoted an article 7 months ago

Article

Uncensor any LLM with abliteration

By

•

Jun 13

• 386

upvoted 3 papers 7 months ago

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29 • 46

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 87

upvoted a collection 8 months ago

Yi-1.5 (2024/05)

10 items • Updated May 20 • 91

upvoted 2 articles 8 months ago

Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

By

•

May 7

• 7

Article

Introducing the Open Chain of Thought Leaderboard

Apr 23

• 27

upvoted an article 9 months ago

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 229

upvoted a collection 9 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 25 days ago • 698

upvoted a paper 9 months ago

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Paper • 2404.07647 • Published Apr 11 • 4

upvoted a collection 9 months ago

OpenCerebrum-2.0

My open source take on Aether Research's proprietary Cerebrum dataset. • 3 items • Updated Apr 13 • 2

upvoted 2 papers 9 months ago

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9 • 64

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104

upvoted a paper 10 months ago

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 75

upvoted 2 collections 10 months ago

Augmentable

A collection of datasets that should be augmented further with gpt-4 • 13 items • Updated Jan 2 • 4

Hub Models

834 items • Updated 3 days ago • 5