Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07143

生成式AI導論 2024

https://www.youtube.com/@HungyiLeeNTU

Re3: Generating Longer Stories With Recursive Reprompting and Revision

Paper • 2210.06774 • Published Oct 13, 2022 • 2
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 1
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Paper • 2402.04253 • Published Feb 6
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Paper • 2305.19118 • Published May 30, 2023

+Context Scaling

Evaluating Very Long-Term Conversational Memory of LLM Agents

Paper • 2402.17753 • Published Feb 27 • 17
Training-Free Long-Context Scaling of Large Language Models

Paper • 2402.17463 • Published Feb 27 • 19
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29 • 19
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14 • 20

about 4 hours ago

ibm/AttaQ

Viewer • Updated Jan 26 • 1.4k • 1.35k • 7
ibm/merlinite-7b

Text Generation • Updated Mar 5 • 8.81k • 101
microsoft/Orca-2-13b

Text Generation • Updated Nov 22, 2023 • 16.7k • 658
snorkelai/snorkel-curated-instruction-tuning

Preview • Updated Mar 11 • 3 • 9

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 46
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 50
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 40
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29 • 19

LLM_architectures

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26 • 41
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 50
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 10
Reformer: The Efficient Transformer

Paper • 2001.04451 • Published Jan 13, 2020

Papers - Context

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16 • 40
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Paper • 2402.11550 • Published Feb 18 • 12
A Neural Conversational Model

Paper • 1506.05869 • Published Jun 19, 2015 • 2
Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 18

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 74
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 18
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 68
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 40

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 74
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1 • 22
Zoology: Measuring and Improving Recall in Efficient Language Models

Paper • 2312.04927 • Published Dec 8, 2023 • 2

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7 • 25
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Paper • 2305.07185 • Published May 12, 2023 • 8
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 59
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5 • 12

Previous
1
2
3
4
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs