README.md · Habana/gpt2 at e27ecd4c60f6b2c8544c71eb0b0eaaddbcad26c4

metadata

license: apache-2.0

Optimum Habana is the interface between the Transformers library and Habana's Gaudi processor (HPU). It provides a set of tools enabling easy and fast model loading and fine-tuning on single- and multi-HPU settings for different downstream tasks. Learn more about how to take advantage of the power of Habana HPUs to train Transformers models at hf.co/hardware/habana.

GPT2 model HPU configuration

This model contains just the GaudiConfig file for running the GPT2 model on Habana's Gaudi processors (HPU).

This model contains no model weights, only a GaudiConfig.

This enables to specify:

use_habana_mixed_precision: whether to use Habana Mixed Precision (HMP)
- hmp_opt_level: optimization level for HMP, see here for a detailed explanation
- hmp_bf16_ops: list of operators that should run in bf16
- hmp_fp32_ops: list of operators that should run in fp32
- hmp_is_verbose: verbosity
use_fused_adam: whether to use Habana's custom AdamW implementation
use_fused_clip_norm: whether to use Habana's fused gradient norm clipping operator

Usage

The model is instantiated the same way as in the Transformers library. The only difference is that the Gaudi configuration has to be loaded and provided to the trainer:

from optimum.habana import GaudiConfig, GaudiTrainer, GaudiTrainingArguments
from transformers import GPT2Tokenizer, GPT2Model

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
gaudi_config = GaudiConfig.from_pretrained("Habana/gpt2")
args = GaudiTrainingArguments(
    output_dir="/tmp/output_dir",
    use_habana=True,
    use_lazy_mode=True,
)

trainer = GaudiTrainer(
    model=model,
    gaudi_config=gaudi_config,
    args=args,
    tokenizer=tokenizer,
)
trainer.train()