Training Code

#1
by Gatozu35 - opened

Hello, is there code available for the training of prosparse models?

SparseLLMs org

The training of ProSparse is based on BMTrain and an unreleased version of CPM-Live. Actually, our training paradigm is similar to general pre-training of LLaMA, except the ReLU activation and the progressive L1L_1 regularization loss on the FFN intermediate outputs (i.e., the output x in this line).

Thank you for your reply! From this, I assume you are not going to release the code anytime soon. I appreciate the guidance though!

Gatozu35 changed discussion status to closed

Sign up or log in to comment