microsoft
/

phi-2

@@ -19,6 +19,8 @@ Our model hasn't been fine-tuned through reinforcement learning from human feedb
 Phi-2 was integrated in `transformers` version 4.37. If you need to use an earlier version, you need to pass `trust_remote_code=True` to the `from_pretrained()` function.
 ## Intended Uses
 Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format.

 Phi-2 was integrated in `transformers` version 4.37. If you need to use an earlier version, you need to pass `trust_remote_code=True` to the `from_pretrained()` function.
+Phi-2 is known for having an attention overflow issue (with FP16). If you are facing this issue, please enable/disable autocast on the [PhiAttention.forward()](https://huggingface.co/microsoft/phi-2/blob/main/modeling_phi.py#L306) function.
 ## Intended Uses
 Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format.