Just curious about what to expect from this model.

by stduhpf - opened Jan 15

Jan 15

I'm wondering, is it just a simple untrained "clown car" merge of tinyllamas, or did you train it further as a "proper" MoE model already?
If it's the later, I'm excited to try it out and see how it compares to 7B models.

eastwind

Owner Jan 15

Unfortunately it's the first. however I am sourcing a good instruction tune dataset to further train the MoE layer. Just haven't had the time so far.

stduhpf

Jan 15

•

edited Jan 15

I see. Inference speed seem very good, so if the quality of outputs gets even close to 7B models once trained properly, this might be a very good alternative.

stduhpf changed discussion status to closed Jan 15

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment