Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.19737

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 40
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8 • 28
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 132

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 75
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 74
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 58

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

Paper • 2403.09636 • Published Mar 14 • 2
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18 • 16
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5 • 12
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 56

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1 • 8
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 574
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26 • 23
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 106

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19 • 5
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20 • 13
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20 • 10
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10 • 62

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 82
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 41
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 63
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29 • 46

15. Interesting.what is this ? how it works?

Running on CPU Upgrade

259

🍿

AiTube
sshh12/Mistral-7B-LoRA-AudioCLAP

Updated Dec 13, 2023 • 7 • 4
microsoft/phi-1_5

Text Generation • Updated Apr 29 • 84.9k • 1.3k
stabilityai/stablecode-instruct-alpha-3b

Text Generation • Updated Aug 8, 2023 • 41 • 302

AutoMix: Automatically Mixing Language Models

Paper • 2310.12963 • Published Oct 19, 2023 • 14
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Paper • 2310.03094 • Published Oct 4, 2023 • 12
MatFormer: Nested Transformer for Elastic Inference

Paper • 2310.07707 • Published Oct 11, 2023 • 1
DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Paper • 2310.08461 • Published Oct 12, 2023 • 1

Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 72
Natural Language Supervision for General-Purpose Audio Representations

Paper • 2309.05767 • Published Sep 11, 2023 • 7
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 50
AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 23

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs