nanotron
/

mistral-nanotron

Model card Files Files and versions Community

mistral-nanotron / README.md

nouamanetazi's picture

nouamanetazi HF staff

Upload folder using huggingface_hub

5d8e8eb verified 8 months ago

|

No virus

863 Bytes

	---
	library_name: nanotron
	---

	# ⚙️ Nano-Mistral

	Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/nanotron/)

	## 🚀 Quickstart

	```python
	# Generate a config file
	python config_tiny_mistral.py

	# Run training
	export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
	torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
	```

	## 🚀 Use your custom model

	- Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration
	- Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture
	- Pass the previous to the `DistributedTrainer` class in `run_train.py`:
	```python
	trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
	```
	- Run training as usual