metadata
license: apache-2.0
Optimum Habana is the interface between the Transformers library and Habana's Gaudi processor (HPU). It provides a set of tools enabling easy and fast model loading and fine-tuning on single- and multi-HPU settings for different downstream tasks. Learn more about how to take advantage of the power of Habana HPUs to train Transformers models at hf.co/hardware/habana.
GPT2 model HPU configuration
This model contains just the GaudiConfig
file for running the GPT2 model on Habana's Gaudi processors (HPU).
This model contains no model weights, only a GaudiConfig.
This enables to specify:
use_habana_mixed_precision
: whether to use Habana Mixed Precision (HMP)hmp_opt_level
: optimization level for HMP, see here for a detailed explanationhmp_bf16_ops
: list of operators that should run in bf16hmp_fp32_ops
: list of operators that should run in fp32hmp_is_verbose
: verbosity
use_fused_adam
: whether to use Habana's custom AdamW implementationuse_fused_clip_norm
: whether to use Habana's fused gradient norm clipping operator
Usage
The model is instantiated the same way as in the Transformers library. The only difference is that the Gaudi configuration has to be loaded and provided to the trainer:
from optimum.habana import GaudiConfig, GaudiTrainer, GaudiTrainingArguments
from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
gaudi_config = GaudiConfig.from_pretrained("Habana/gpt2")
args = GaudiTrainingArguments(
output_dir="/tmp/output_dir",
use_habana=True,
use_lazy_mode=True,
)
trainer = GaudiTrainer(
model=model,
gaudi_config=gaudi_config,
args=args,
tokenizer=tokenizer,
)
trainer.train()