An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction Paper • 1811.00062 • Published Oct 31, 2018 • 2
mT5: A massively multilingual pre-trained text-to-text transformer Paper • 2010.11934 • Published Oct 22, 2020 • 4
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance Paper • 2310.10021 • Published Oct 16, 2023 • 2
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 47
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer Paper • 2305.16380 • Published May 25, 2023 • 4
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning Paper • 2310.20587 • Published Oct 31, 2023 • 16
Structural Similarities Between Language Models and Neural Response Measurements Paper • 2306.01930 • Published Jun 2, 2023 • 2
Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 37
A Thorough Examination of Decoding Methods in the Era of LLMs Paper • 2402.06925 • Published Feb 10 • 1
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering Paper • 2311.06668 • Published Nov 11, 2023 • 5