--- language: - en tags: - text generation - pytorch - causal-lm license: mit datasets: - allenai/c4 - HuggingFaceFW/fineweb-edu - togethercomputer/RedPajama-Data-V2 - Muennighoff/natural-instructions - databricks/databricks-dolly-15k - HuggingFaceTB/smollm-corpus - open-phi/textbooks - roneneldan/TinyStories --- # Mixtress 135M ## Model Description Mixtress 135M is a transformer model based upon the [Mixtral](https://huggingface.co/docs/transformers/en/model_doc/mixtral) architecture. It is the culmination of approximately 20 weeks of [Kaggle](https://kaggle.com) free hours, and 67 twelve-hour training runs. The results are laughably bad. The model has massively overfit to the training data, and it saw far less tokens than other models of comparable size. But at least I can say we saw it through to completion! ## Training data Mixtress was trained on a curated sampling of data from the following datasets: - allenai/c4 - HuggingFaceFW/fineweb-edu - togethercomputer/RedPajama-Data-V2 - Muennighoff/natural-instructions - databricks/databricks-dolly-15k - HuggingFaceTB/smollm-corpus - open-phi/textbooks - roneneldan/TinyStories ## Training procedure This model was trained for 2.15 billion tokens over 20,000 optimizer steps. It was trained as a masked autoregressive language model, using cross-entropy loss. The final train loss was 1.941, validation loss was 2.206, and perplexity was 9.136. Mixtress was pre-trained and fine-tuned simultaneously. Full reproduction code may be found [at this URL](https://www.kaggle.com/code/luciferianink/pretraining-a-mixtral), or in the Jupyter notebook [in this repository](./pretraining-a-mixtral.ipynb). ## Intended Use and Limitations The model is best at what it was pretrained for, which is generating conversational text and answering questions from a prompt. ### How to use You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run: ```py >>> from transformers import pipeline >>> generator = pipeline('text-generation', model='UNSAFE/Mixtress-135M') >>> generator("In a shocking finding, ", do_sample=True, temperature=0.7, min_length=50) [{'generated_text': 'In a shocking finding, 20 years ago, U.S. President Donald Trump'}] ``` ## Eval results All evaluations were done using the [Pythia evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness). ### Scores | Model and Size | ARC-easy | ARC-challenge | HellaSwag | OpenBookQA | PiQA | | ----------------- | ---------- | ------------- | ---------- | ---------- | ---------- | | gpt-neo-125m | 22.95 | N/A | 30.26 | N/A | N/A | | **Mixtress-135M** | **0.2921** | **0.2457** | **0.2699** | **0.2180** | **0.5267** | ## Join Us If you would like to chat with us, please join the [Discord](https://discord.gg/8ZmHP8CqUX) server!