Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1706.03762

AI seminal papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published 19 days ago • 136
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 23 days ago • 83
An accurate detection is not all you need to combat label noise in web-noisy datasets

Paper • 2407.05528 • Published Jul 8 • 3
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

Paper • 2407.00402 • Published Jun 29 • 22

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43
Playing Atari with Deep Reinforcement Learning

Paper • 1312.5602 • Published Dec 19, 2013
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1 • 23
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 10
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43
LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 87

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 241

Ilya's papers for Carmack

Ilya Sutskever: "If you really learn all of these, you’ll know 90% of what matters today." Full list: https://punkx.org/jackdoe/30.html

Recurrent Neural Network Regularization

Paper • 1409.2329 • Published Sep 8, 2014
Pointer Networks

Paper • 1506.03134 • Published Jun 9, 2015
Order Matters: Sequence to sequence for sets

Paper • 1511.06391 • Published Nov 19, 2015
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Paper • 1811.06965 • Published Nov 16, 2018

Language model papers

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 10
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 29
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 10

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43

Previous
1
2
3
...
6
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs