Just curious about what to expect from this model.

#3
by stduhpf - opened

I'm wondering, is it just a simple untrained "clown car" merge of tinyllamas, or did you train it further as a "proper" MoE model already?
If it's the later, I'm excited to try it out and see how it compares to 7B models.

Unfortunately it's the first. however I am sourcing a good instruction tune dataset to further train the MoE layer. Just haven't had the time so far.

I see. Inference speed seem very good, so if the quality of outputs gets even close to 7B models once trained properly, this might be a very good alternative.

stduhpf changed discussion status to closed

Sign up or log in to comment