HMoE: Heterogeneous Mixture of Experts for Language Modeling Paper • 2408.10681 • Published Aug 20, 2024 • 8
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published Nov 4, 2024 • 24
Scaling Laws for Floating Point Quantization Training Paper • 2501.02423 • Published 13 days ago • 25
Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs Paper • 2407.12117 • Published Jul 16, 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling Paper • 2405.14578 • Published May 23, 2024 • 1
HMoE: Heterogeneous Mixture of Experts for Language Modeling Paper • 2408.10681 • Published Aug 20, 2024 • 8
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published Nov 4, 2024 • 24
More Expressive Attention with Negative Weights Paper • 2411.07176 • Published Nov 11, 2024 • 1
3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds Paper • 1707.06783 • Published Jul 21, 2017
Scaling Laws for Floating Point Quantization Training Paper • 2501.02423 • Published 13 days ago • 25
Scaling Laws for Floating Point Quantization Training Paper • 2501.02423 • Published 13 days ago • 25
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published Sep 19, 2024 • 12
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation Paper • 2310.13119 • Published Oct 19, 2023 • 11
Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields Paper • 2307.11335 • Published Jul 21, 2023
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published Sep 3, 2024 • 36