Consider using the MPT-7b model to train the new pygmalion model?

#9
by win10 - opened

Consider using the MPT-7b model to train the new pygmalion model?

Pygmalion org

Yep, I'm considering it. I'm keeping a close eye on the new models being released. Considerations so far:

  • RedPajama is a very good alternative to LLaMA because of the license. However, NeoX (what RedPajama is based on) is a less popular architecture. LLaMA has a wider ecosystem (e.g. support in WebLLM) and is almost 40% faster at inference time, I've been told.
  • OpenLLaMA is based on the LLaMA architecture, so it's more promising in that regards. However, it's not done training yet.
  • MPT is an entirely custom architecture. For now, there is no wider ecosystem support (e.g. koboldcpp), and several things are "broken" on it (e.g. impossible to train a LoRA for now, I've been told)

Still, fine-tuning off of the 65k ctxlen version sounds like a fun project, so I am considering it if I have GPUs idling around doing nothing.

Sign up or log in to comment