Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.02678

Papers - XAI - Token Tracing - Model MLP Layers Plots

Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1

Papers - XAI - Attention - MLP - Partitioning - Affine Maps

Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1

Papers - Llama 3

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 44
Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1

Papers - Attention - Multi-Head Attention (MHA)

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2
Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18
Are Sixteen Heads Really Better than One?

Paper • 1905.10650 • Published May 25, 2019 • 2
Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1

Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI

Paper • 2404.11428 • Published Apr 17 • 1
A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22 • 20
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Paper • 2404.07129 • Published Apr 10 • 3
The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Paper • 2406.01506 • Published Jun 3 • 3

Papers - Reasoning - GSM8k

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
Premise Order Matters in Reasoning with Large Language Models

Paper • 2402.08939 • Published Feb 14 • 25
Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1

Papers - Reasoning

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19 • 18
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46
How Far Are We from Intelligent Visual Deductive Reasoning?

Paper • 2403.04732 • Published Mar 7 • 18
Learning to Reason and Memorize with Self-Notes

Paper • 2305.00833 • Published May 1, 2023 • 4

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs