Peter Szemraj's picture

Peter Szemraj PRO

pszemraj

·

https://pszemraj.carrd.co/

pszemraj

AI & ML interests

metallic intuition

Recent Activity

liked a model 2 days ago

AtlaAI/Selene-1-Mini-Llama-3.1-8B

updated a model 2 days ago

pszemraj/ModernBERT2gpt2-700m-v0.1

liked a Space 3 days ago

PramaLLC/BEN2

View all activity

Organizations

pszemraj's activity

upvoted 2 collections 9 days ago

NeMo Curator - Classifier Models

Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 9 items • Updated 16 days ago • 13

SmolVLM 256M & 500M

Collection for models & demos for even smoller SmolVLM release • 12 items • Updated 9 days ago • 63

upvoted an article 9 days ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

10 days ago

• 102

upvoted a collection 15 days ago

Deita

14 items • Updated May 20, 2024 • 11

upvoted 3 papers 16 days ago

An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published 23 days ago • 37

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 18 days ago • 52

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published 22 days ago • 79

upvoted a collection 25 days ago

DolphinLabeled Datasets

Eric Hartford has added labels to help you filter datasets, for your pleasure. • 5 items • Updated 26 days ago • 10

upvoted a paper 28 days ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published Dec 24, 2024 • 72

upvoted a collection about 1 month ago

Embedding Model Datasets

A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated Jul 3, 2024 • 101

upvoted 4 papers about 1 month ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 48

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

Paper • 2412.15204 • Published Dec 19, 2024 • 33

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 344

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

upvoted a collection about 1 month ago

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 130

upvoted 3 papers about 1 month ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 89

OmniPred: Language Models as Universal Regressors

Paper • 2402.14547 • Published Feb 22, 2024 • 12

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11, 2024 • 20

upvoted 2 papers about 2 months ago

Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published Nov 28, 2024 • 33

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 57