Spaces:
Paused
Paused
File size: 8,309 Bytes
45ee559 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
# Implementing a Model
1. Implement layers.
You can either implement the layers under `TTS/tts/layers/new_model.py` or in the model file `TTS/tts/model/new_model.py`.
You can also reuse layers already implemented.
2. Test layers.
We keep tests under `tests` folder. You can add `tts` layers tests under `tts_tests` folder.
Basic tests are checking input-output tensor shapes and output values for a given input. Consider testing extreme cases that are more likely to cause problems like `zero` tensors.
3. Implement a loss function.
We keep loss functions under `TTS/tts/layers/losses.py`. You can also mix-and-match implemented loss functions as you like.
A loss function returns a dictionary in a format ```{’loss’: loss, ‘loss1’:loss1 ...}``` and the dictionary must at least define the `loss` key which is the actual value used by the optimizer. All the items in the dictionary are automatically logged on the terminal and the Tensorboard.
4. Test the loss function.
As we do for the layers, you need to test the loss functions too. You need to check input/output tensor shapes,
expected output values for a given input tensor. For instance, certain loss functions have upper and lower limits and
it is a wise practice to test with the inputs that should produce these limits.
5. Implement `MyModel`.
In 🐸TTS, a model class is a self-sufficient implementation of a model directing all the interactions with the other
components. It is enough to implement the API provided by the `BaseModel` class to comply.
A model interacts with the `Trainer API` for training, `Synthesizer API` for inference and testing.
A 🐸TTS model must return a dictionary by the `forward()` and `inference()` functions. This dictionary must `model_outputs` key that is considered as the main model output by the `Trainer` and `Synthesizer`.
You can place your `tts` model implementation under `TTS/tts/models/new_model.py` then inherit and implement the `BaseTTS`.
There is also the `callback` interface by which you can manipulate both the model and the `Trainer` states. Callbacks give you
an infinite flexibility to add custom behaviours for your model and training routines.
For more details, see {ref}`BaseTTS <Base tts Model>` and :obj:`TTS.utils.callbacks`.
6. Optionally, define `MyModelArgs`.
`MyModelArgs` is a 👨✈️Coqpit class that sets all the class arguments of the `MyModel`. `MyModelArgs` must have
all the fields neccessary to instantiate the `MyModel`. However, for training, you need to pass `MyModelConfig` to
the model.
7. Test `MyModel`.
As the layers and the loss functions, it is recommended to test your model. One smart way for testing is that you
create two models with the exact same weights. Then we run a training loop with one of these models and
compare the weights with the other model. All the weights need to be different in a passing test. Otherwise, it
is likely that a part of the model is malfunctioning or not even attached to the model's computational graph.
8. Define `MyModelConfig`.
Place `MyModelConfig` file under `TTS/models/configs`. It is enough to inherit the `BaseTTSConfig` to make your
config compatible with the `Trainer`. You should also include `MyModelArgs` as a field if defined. The rest of the fields should define the model
specific values and parameters.
9. Write Docstrings.
We love you more when you document your code. ❤️
# Template 🐸TTS Model implementation
You can start implementing your model by copying the following base class.
```python
from TTS.tts.models.base_tts import BaseTTS
class MyModel(BaseTTS):
"""
Notes on input/output tensor shapes:
Any input or output tensor of the model must be shaped as
- 3D tensors `batch x time x channels`
- 2D tensors `batch x channels`
- 1D tensors `batch x 1`
"""
def __init__(self, config: Coqpit):
super().__init__()
self._set_model_args(config)
def _set_model_args(self, config: Coqpit):
"""Set model arguments from the config. Override this."""
pass
def forward(self, input: torch.Tensor, *args, aux_input={}, **kwargs) -> Dict:
"""Forward pass for the model mainly used in training.
You can be flexible here and use different number of arguments and argument names since it is intended to be
used by `train_step()` without exposing it out of the model.
Args:
input (torch.Tensor): Input tensor.
aux_input (Dict): Auxiliary model inputs like embeddings, durations or any other sorts of inputs.
Returns:
Dict: Model outputs. Main model output must be named as "model_outputs".
"""
outputs_dict = {"model_outputs": None}
...
return outputs_dict
def inference(self, input: torch.Tensor, aux_input={}) -> Dict:
"""Forward pass for inference.
We don't use `*kwargs` since it is problematic with the TorchScript API.
Args:
input (torch.Tensor): [description]
aux_input (Dict): Auxiliary inputs like speaker embeddings, durations etc.
Returns:
Dict: [description]
"""
outputs_dict = {"model_outputs": None}
...
return outputs_dict
def train_step(self, batch: Dict, criterion: nn.Module) -> Tuple[Dict, Dict]:
"""Perform a single training step. Run the model forward pass and compute losses.
Args:
batch (Dict): Input tensors.
criterion (nn.Module): Loss layer designed for the model.
Returns:
Tuple[Dict, Dict]: Model ouputs and computed losses.
"""
outputs_dict = {}
loss_dict = {} # this returns from the criterion
...
return outputs_dict, loss_dict
def train_log(self, batch: Dict, outputs: Dict, logger: "Logger", assets:Dict, steps:int) -> None:
"""Create visualizations and waveform examples for training.
For example, here you can plot spectrograms and generate sample sample waveforms from these spectrograms to
be projected onto Tensorboard.
Args:
ap (AudioProcessor): audio processor used at training.
batch (Dict): Model inputs used at the previous training step.
outputs (Dict): Model outputs generated at the previoud training step.
Returns:
Tuple[Dict, np.ndarray]: training plots and output waveform.
"""
pass
def eval_step(self, batch: Dict, criterion: nn.Module) -> Tuple[Dict, Dict]:
"""Perform a single evaluation step. Run the model forward pass and compute losses. In most cases, you can
call `train_step()` with no changes.
Args:
batch (Dict): Input tensors.
criterion (nn.Module): Loss layer designed for the model.
Returns:
Tuple[Dict, Dict]: Model ouputs and computed losses.
"""
outputs_dict = {}
loss_dict = {} # this returns from the criterion
...
return outputs_dict, loss_dict
def eval_log(self, batch: Dict, outputs: Dict, logger: "Logger", assets:Dict, steps:int) -> None:
"""The same as `train_log()`"""
pass
def load_checkpoint(self, config: Coqpit, checkpoint_path: str, eval: bool = False) -> None:
"""Load a checkpoint and get ready for training or inference.
Args:
config (Coqpit): Model configuration.
checkpoint_path (str): Path to the model checkpoint file.
eval (bool, optional): If true, init model for inference else for training. Defaults to False.
"""
...
def get_optimizer(self) -> Union["Optimizer", List["Optimizer"]]:
"""Setup an return optimizer or optimizers."""
pass
def get_lr(self) -> Union[float, List[float]]:
"""Return learning rate(s).
Returns:
Union[float, List[float]]: Model's initial learning rates.
"""
pass
def get_scheduler(self, optimizer: torch.optim.Optimizer):
pass
def get_criterion(self):
pass
def format_batch(self):
pass
```
|