Just curious about what to expect from this model.
#3
by
stduhpf
- opened
I'm wondering, is it just a simple untrained "clown car" merge of tinyllamas, or did you train it further as a "proper" MoE model already?
If it's the later, I'm excited to try it out and see how it compares to 7B models.
Unfortunately it's the first. however I am sourcing a good instruction tune dataset to further train the MoE layer. Just haven't had the time so far.
I see. Inference speed seem very good, so if the quality of outputs gets even close to 7B models once trained properly, this might be a very good alternative.
stduhpf
changed discussion status to
closed