This cross-architecture distillation, with Phi?

#14
by sometimesanotion - opened

Even though this model is superceded by Virtuoso Small, it's an outstanding achievement. Can this be done with Phi-4, particularly if Phi is Llamafied? Can Phi-4 be given strong function-calling or code autocomplete capabilities through distillation? I would really like a complete alternative to Qwen in the 14B parameter space.

We need to create a discord server to keep everyone updated on our progress. we have two open-source releases coming soon (both in that size range) distilled from much larger models. Also, the teacher’s name rhymes with “Keepseek.”

I will gladly join that Discord. Thank you for your team's efforts. It's amazing what mergekit and evolkit can do with home hardware!

Sign up or log in to comment