Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.19756

Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18 • 30
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23 • 55

Is Cosine-Similarity of Embeddings Really About Similarity?

Paper • 2403.05440 • Published Mar 8 • 3
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning

Paper • 2402.16829 • Published Feb 26
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 102

Papers - Transformer Research - Custom Layers

Wide Residual Networks

Paper • 1605.07146 • Published May 23, 2016 • 2
SkipNet: Learning Dynamic Routing in Convolutional Networks

Paper • 1711.09485 • Published Nov 26, 2017 • 2
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 102

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8 • 15
Chronos: Learning the Language of Time Series

Paper • 2403.07815 • Published Mar 12 • 43
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11 • 52
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Paper • 2211.07600 • Published Nov 14, 2022

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 59

Favorite AI Publication/s

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 575
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 102
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 91

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29 • 26
Rethinking Interpretability in the Era of Large Language Models

Paper • 2402.01761 • Published Jan 30 • 19
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 107
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Paper • 2402.07827 • Published Feb 12 • 43

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4 • 45
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 47
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1 • 22
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1 • 22

15. Interesting.what is this ? how it works?

Running on CPU Upgrade

261

🍿

AiTube
sshh12/Mistral-7B-LoRA-AudioCLAP

Updated Dec 13, 2023 • 7 • 4
microsoft/phi-1_5

Text Generation • Updated Apr 29 • 88.3k • 1.3k
stabilityai/stablecode-instruct-alpha-3b

Text Generation • Updated Aug 8, 2023 • 40 • 302

paper to review

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Paper • 2312.02087 • Published Dec 4, 2023 • 19
FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 28
Orthogonal Adaptation for Modular Customization of Diffusion Models

Paper • 2312.02432 • Published Dec 5, 2023 • 12
ReconFusion: 3D Reconstruction with Diffusion Priors

Paper • 2312.02981 • Published Dec 5, 2023 • 8

Previous
1
2
3
4
5
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs