Link to the code + set-up used?

#6
by mark-arts - opened

Hi! Really awesome project, and congrats on only dropping a couple of points to 76~77 MMLU! πŸ”₯

Better than Claude Haiku, Mixtral and even trading blows with Bigxtral at 42b is wild 🀯
I had a pipe dream of doing something similar to this, but with the 8b version to make a 4~5b Llama-3. Kind of ambitious since I'd be hoping to do it in MLX. Was also hoping to document the whole thing into a Jupyter Notebook to open-source it to the community. Is there any chance you could share the code / pipeline you used for the layer selection and pruning?

Also eagerly awaiting to see if you try the same out on the 70b-instruct!

Hi! I agree that it is an awesome project :)
I would also love to see something like this on the instruction tuned model!!! @mark-arts have you found anything in that regard since when you posted this comment?
Thanks in advance

Sign up or log in to comment