1 1721 570

Welcome to matlok

matlok

https://matlok.ai

matlok-ai

AI & ML interests

Welcome! We share large, open source multimodal datasets for training and fine-tuning AI to write python and build AI models, we curate collections of guides, papers, datasets, models and tools like frankenmerging AI models.

Recent Activity

upvoted a collection 5 days ago

Code Evaluation

updated a collection 5 days ago

Papers - Encoders - Roberta

updated a collection 5 days ago

Papers - Text - Bidirectional Encoders

View all activity

Organizations

None yet

matlok's activity

upvoted a collection 5 days ago

Code Evaluation

Collection

Collection of Papers on Code Evaluation (from code generation language models) • 45 items • Updated Oct 29, 2024 • 15

upvoted 4 papers 5 days ago

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Paper • 2102.04664 • Published Feb 9, 2021 • 2

Deep Data Flow Analysis

Paper • 2012.01470 • Published Nov 21, 2020 • 1

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

Paper • 2401.05375 • Published Dec 15, 2023 • 1

Compiling C to Safe Rust, Formalized

Paper • 2412.15042 • Published 15 days ago • 1

upvoted a paper 9 days ago

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Paper • 2410.20771 • Published Oct 28, 2024 • 3

upvoted a paper 11 days ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 22 days ago • 80

upvoted 3 papers 12 days ago

upvoted 2 papers 14 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 15 days ago • 334

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 16 days ago • 116

upvoted a collection 14 days ago

ModernBERT

Collection

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 15 days ago • 111

upvoted a paper 16 days ago

An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

upvoted 2 papers 18 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published 23 days ago • 95

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 21 days ago • 136

upvoted 2 papers 19 days ago

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Paper • 2103.06874 • Published Mar 11, 2021 • 1

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3

upvoted 2 papers 20 days ago

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Paper • 2108.06209 • Published Aug 7, 2021 • 1

StarCraft II: A New Challenge for Reinforcement Learning

Paper • 1708.04782 • Published Aug 16, 2017 • 1