just curious

by 010O11 - opened Jan 4

Jan 4

"The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself." >> are you sure? how so? my intuition telling me the opposite, sorry for that...

eastwind

Owner Jan 4

Well you are finetuning 8x1b(6.5b approx) against finetuning 1b.

In the llm space bigger is almost always better. If not then why is 7b model not as good as 70b?

010O11

Jan 5

This comment has been hidden

macadeliccc

Jan 5

Hey so ive been messing around with the mixtral branch of mergekit and im just curious how you got your config to work? I am trying to replicate with the base model for education and it throws a tremendous amount of errors. Did you edit the mixtral branch further to fit your particular use case?

eastwind

Owner Jan 5

It worked out of the box for me. No changes. But it only works with llama and mistral architectures.

eastwind changed discussion status to closed Jan 5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment