Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2004.06343

Papers - Coding - BPE vs Pointer Mixture Network

Code Completion using Neural Attention and Byte Pair Encoding

Paper • 2004.06343 • Published Apr 14, 2020 • 2

Papers - Coding - Out of Vocabulary

Code Completion using Neural Attention and Byte Pair Encoding

Paper • 2004.06343 • Published Apr 14, 2020 • 2

Papers - Text - Training - Code - Byte Pair Encoding

Code Completion using Neural Attention and Byte Pair Encoding

Paper • 2004.06343 • Published Apr 14, 2020 • 2

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

There's usually interesting papers in the model cards on the leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 29
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Paper • 2306.08568 • Published Jun 14, 2023 • 28
SantaCoder: don't reach for the stars!

Paper • 2301.03988 • Published Jan 9, 2023 • 7
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25 • 46

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs