Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.01093

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4 • 36
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11 • 42
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 20

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 22
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 16
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 9
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 8

Empowering SLMs

Collection of resources focusing on making Small Language Models (SLMs) better at various tasks

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 124
Scaling Laws for Downstream Task Performance of Large Language Models

Paper • 2402.04177 • Published Feb 6 • 17
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 70
Orca-Math: Unlocking the potential of SLMs in Grade School Math

Paper • 2402.14830 • Published Feb 16 • 24

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 99
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15 • 38
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15 • 17
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15 • 35

Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45

Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45
Computing Power and the Governance of Artificial Intelligence

Paper • 2402.08797 • Published Feb 13 • 11
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 61

Efficient Training

Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5 • 12
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31 • 15
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 61

Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5 • 86
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 80
jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Paper • 2409.10173 • Published Sep 16 • 23

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1 • 21
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45
Premise Order Matters in Reasoning with Large Language Models

Paper • 2402.08939 • Published Feb 14 • 25

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30 • 16
Transforming and Combining Rewards for Aligning Large Language Models

Paper • 2402.00742 • Published Feb 1 • 11
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 67
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs