Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 21 days ago • 13
HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model Paper • 2502.10807 • Published 27 days ago • 3
NatureLM: Deciphering the Language of Nature for Scientific Discovery Paper • 2502.07527 • Published about 1 month ago • 19
view post Post 2116 🪄 LayerDiffuse - Flux Version (Demo) 🪄LayerDiffuse - Transparent Image Layer Diffusion using Latent TransparencyDemo: https://huggingface.co/spaces/eienmojiki/Flux-LayerDiffuse See translation 3 replies · 🔥 5 5 ❤️ 3 3 + Reply
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 26
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference Paper • 2410.21262 • Published Oct 28, 2024 • 1
In-Context Learning Dynamics with Random Binary Sequences Paper • 2310.17639 • Published Oct 26, 2023
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space Paper • 2406.19370 • Published Jun 27, 2024 • 1
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks Paper • 2311.12786 • Published Nov 21, 2023 • 2
Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model Paper • 2402.07757 • Published Feb 12, 2024
A Survey on Dialog Management: Recent Advances and Challenges Paper • 2005.02233 • Published May 5, 2020
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents Paper • 2305.13040 • Published May 22, 2023 • 2
GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection Paper • 2111.14592 • Published Nov 29, 2021 • 1
Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation Paper • 2310.07968 • Published Oct 12, 2023
Preview, Attend and Review: Schema-Aware Curriculum Learning for Multi-Domain Dialog State Tracking Paper • 2106.00291 • Published Jun 1, 2021
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning Paper • 2409.14674 • Published Sep 23, 2024 • 43
Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks Paper • 2310.18882 • Published Oct 29, 2023 • 1