just curious

#2
by 010O11 - opened

"The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself." >> are you sure? how so? my intuition telling me the opposite, sorry for that...

Owner

Well you are finetuning 8x1b(6.5b approx) against finetuning 1b.

In the llm space bigger is almost always better. If not then why is 7b model not as good as 70b?

This comment has been hidden

Hey so ive been messing around with the mixtral branch of mergekit and im just curious how you got your config to work? I am trying to replicate with the base model for education and it throws a tremendous amount of errors. Did you edit the mixtral branch further to fit your particular use case?

Owner

It worked out of the box for me. No changes. But it only works with llama and mistral architectures.

eastwind changed discussion status to closed

Sign up or log in to comment