Submitted by akhaliq 179 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection · 6 authors 13
Submitted by akhaliq 61 ShortGPT: Layers in Large Language Models are More Redundant Than You Expect · 8 authors 21
Submitted by akhaliq 17 Learning to Decode Collaboratively with Multiple Language Models · 5 authors 6
Submitted by akhaliq 11 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL · 12 authors 1
Submitted by akhaliq 11 Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling · 6 authors 1