TODO: Complete This architecture takes GQA and tied embeddings to create an effeceint 5B model This uses a mix of data yet to be published