No support for float16 on CPU?
Tried this model on CPU only with float16 and the following code gave me this error:RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
from transformers import AutoTokenizer, OPTForCausalLM
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", torch_dtype=torch.float16)
input_text = """
# The benefits of deadlifting
## INTRODUCTION
"""
randomizer_value = 0
repititions = 1
# set seed to reproduce results. Feel free to change the seed though to get different results
torch.manual_seed(randomizer_value)
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(
input_ids,
do_sample=True,
max_length=2000,
top_k=50,
top_p=0.95,
num_return_sequences=1
)
I think this post answers my question:
https://twitter.com/pytorch/status/1450502321838960641?lang=en
FP16 is only supported in CUDA, BF16 has support on newer CPUs and TPUs
Calling .half() on your network and tensors explicitly casts them to FP16, but not all ops are safe to run in half-precision.
For CPU BF16 is supported:
1.10 onwards, PyTorch has a generic API `torch. autocast()` that automatically casts
* CUDA tensors to FP16, and
* CPU tensors to BF16.
source: https://twitter.com/PyTorch/status/1450502326834368516
Now the question is can we use BF16 instead of FP16?
Again answering myself:
We’re empowering PyTorch 1.12 on the 3rd gen @Intel
Xeon® Scalable processor (codename Cooper Lake). It’s the first general purpose x86 CPU with native bfloat16 support, showing a 1.4x to 2.2x performance gain over float32 on the TorchVision models
source:
https://twitter.com/pytorch/status/1559611043273375746?lang=en
Conclusion:
Only recent XEON processors support BFLOAT16 natively. (Cooperlake introduced in June 2020)