Wow! What a model. Didn't know it could be trained with Axolotl

#5
by Acrious - opened

I was briefly flirting with the idea of trying to train a Mistral-7b model with Axolotl but didn't think it was supported yet? Isn't listed as supported by the github page anyway. Is Mistral similar enough to one of these to train? llama, Pythia, cerebras, btlm, mpt, falcon, gpt-j, XGen, phi

Acrious changed discussion title from Wow! What a model. Didn't know it could be trained with Axylotyl to Wow! What a model. Didn't know it could be trained with Axolotl

It does use the llama architecture, so technically it could be trained?

Cognitive Computations org

Axolotl can train many things, including mistral

ehartford changed discussion status to closed

All answers are questions so far... Bing-go
Searching for: Mistral-7b LLM model LLAMA architecture
Yes, the Mistral-7b LLM model uses the LLAMA architecture 1. The Mistral-7b model is a 7-billion-parameter language model that is engineered for superior performance and efficiency. It outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation 1. The model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost 1. The Mistral-7b – Instruct model is fine-tuned to follow instructions and surpasses the Llama 2 13B – Chat model both on human and automated benchmarks 1. The models are released under the Apache 2.0 license 1.

Cool!

Cognitive Computations org
β€’
edited Oct 16, 2023

Mistral does not use the Llama architecture, LLaMA and Llama 2 is LlamaForCausalLM while Mistral is MistralForCausalLM. Mistral also use SWA and GQA which Llama(At least 7B doesn't use GPA) doesn't. I think Mistral also have slightly different layers distribution. Also no, LLaMA or Llama 2 is not released under Apache 2.0, Mistral is though. I think Mistral is very slightly larger than LLaMA and Llama 2 too. I'm not sure if Mistral has been evaluated on human benchmarks, not sure where it came from.

@umarmnaq @Acrious

Awww, Bing-fail... That happens alot, weird that the authors post wasn't there when I commented... Thanks Hartford, Dolphin is amazing, wonder why Axolotl doesn't advertise support for Mistral?

Cognitive Computations org

Awww, Bing-fail... That happens alot, weird that the authors post wasn't there when I commented... Thanks Hartford, Dolphin is amazing, wonder why Axolotl doesn't advertise support for Mistral?

I guess it's because it doesnt really fully suppport Mistral because it doesn't support SWA yet, it just works without SWA(That's why in the config file the author has, it has 8k ctx instead of 32k with SWA).

Sign up or log in to comment