library_name: nanotron | |
# βοΈ Nano-Mistral | |
Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/nanotron/) | |
## π Quickstart | |
```python | |
# Generate a config file | |
python config_tiny_mistral.py | |
# Run training | |
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations | |
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml | |
``` | |
## π Use your custom model | |
- Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration | |
- Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture | |
- Pass the previous to the `DistributedTrainer` class in `run_train.py`: | |
```python | |
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig) | |
``` | |
- Run training as usual | |