PygmalionAI/pygmalion-6b

May 8, 2023

Hello PygmalionAI Team,
Amazing work! I tried Pygmalion6B (GPT-J), already pretty good, but even better fine-tuned on a specific domain!

I have the 2 following questions:

is it possible to have the data your trained Pygmalion6B (GPT-J) to try on other models myself?
-Do you plan to train MPT-7B with this data?

11b

Pygmalion org May 8, 2023

For the data, sure! Shoot me a message on Discord - I'm 0x000011b#4223 there. As for MPT-7B, I don't have any plans to train it at the moment. I'm not a fan of the current 7B (XOR + license + model itself isn't as good as I'd like), so I'm keeping an eye on all the new foundational models that are coming out, but my current thoughts are:

MPT-7B looks strong performance-wise, but the fact that it's a custom architecture full of NotImplementedErrors when training doesn't inspire confidence for me to use it just yet.
RedPajama's 7B looks great! However, for whatever reason, LLaMA is about 40% faster than NeoX (the architecture that RedPajama used), so this is also not 100% ideal.
OpenLLaMA seems to be the most promising: will use the normal LLaMA architecture (so won't fall victim to the two pitfalls above), plus they're training on the same data as RedPajama, so once done, they should all be competitive when it comes to model quality. However, since it's not done training yet I'd rather not rush anything since the current checkpoints underperform LLaMA quite strongly in some tasks.