Text Generation
English

Original model link is 404

#2
by klotz - opened

https://huggingface.co/cognitivecomputations/dolphin-2.6.1-mixtral-8x7b gets 404.
Nothing with dolphin-2.6.1* seems nearby.

Yeah I was about to add a change, Eric saw the performance decreased with 2.6.1 so pulled it, I'll leave mine up for anyone who wants but he's working on retraining

The leading theory is that Axolotl's transformers build doesn't properly train the MoE router, and so it's being "naive in backpropagation", and so 2.7 or whatever he ends up calling it will be properly training the routing and have likely much higher performance

Thank you for the status and the great repo!

Sign up or log in to comment