### Micro Mistral This is a small mistral model with 6 layers It is similar to smol llama varaints uses GQA and tied embeddings. Except it uses mistral style arch with GQA and sliding window attention This architecture takes GQA and tied embeddings to create an effeceint 0.5B model that uses the mistral architecture(It is supported in downstream applications) #### Dataset Minipile Instruct Math OpenOrca Synthetic Data TODO: Complete Dataset section