Collections
Discover the best community collections!
Collections including paper arxiv:2407.02678
-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Transformers Can Represent n-gram Language Models
Paper • 2404.14994 • Published • 18 -
Are Sixteen Heads Really Better than One?
Paper • 1905.10650 • Published • 2 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1
-
Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI
Paper • 2404.11428 • Published • 1 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 20 -
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Paper • 2406.01506 • Published • 3
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 25 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 18 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 18 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 4
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 78 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 5