--- library_name: transformers tags: - moe - moah license: apache-2.0 datasets: - Locutusque/UltraTextbooks language: - en --- # Model Card for Model ID ## Model Details ### Model Description This Model is a first test to combine [Jamba](https://huggingface.co/ai21labs/Jamba-v0.1) architecture with 1.58 bits linear layers and mixture of attention head. The goal is to developpe and test if this kind of architectures have not too much quality loss for a fast inference. - **Model type:** Mixture of attention head and mixture of expert 1.58bit linear layers - **License:** Apache licence 2.0 ### Model Sources [optional] - **Repository:** https://github.com/ostix360/optimized-LLM ## How to Get Started with the Model If you want to test this model please look at this repo at this [commit](https://github.com/ostix360/optimized-LLM/tree/8878e0f0bd764f85ce2ea56790a95f9837fb2fe4) ## Training Details ### Training Data We use the first 100k data of Locutusque/UltraTextbooks to train this model ### Training Procedure We use adam-8 bits with default betas and epsilon values #### Preprocessing [optional] The data fit the model max length i.e. 512 tokens #### Training Hyperparameters Please look at this file to see the hyperparameters ## Technical Specifications [optional] ### Compute Infrastructure #### Hardware - one 4070 ti GPU #### Software - pytorch, transformers etc