Is it necessary to define the attention masks in .generate()?
Other parler models are used like so:generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
Your recommendation is like so:generation = model.generate(input_ids=description_input_ids.input_ids, attention_mask=description_input_ids.attention_mask, prompt_input_ids=prompt_input_ids.input_ids, prompt_attention_mask=prompt_input_ids.attention_mask)
Is there a reason for this differentiation? It is necessary?
For context, I want to be able to switch between parler models without having to write specific code for each implementation, i.e use a single interface for all parler models.
Hi,
You can use the first method if you are generating single samples. You should not see any difference because the mask that you are passing will be equivalent to torch.ones(*size).bool(). When generating batches, however, the 2nd option is the right one as you do not want to attend to the pad tokens when processing the input.
I see, thank you for the clarification! Maybe you could add this as a comment to the README, just so that its clear there is no underlying differentiation with the other parler models.