AdaLoRA

AdaLoRA is a method for optimizing the number of trainable parameters to assign to weight matrices and layers, unlike LoRA, which distributes parameters evenly across all modules. More parameters are budgeted for important weight matrices and layers while less important ones receive fewer parameters.

The abstract from the paper is:

Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA.

AdaLoraConfig

class peft.AdaLoraConfig

< source >

( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False r: int = 8 target_modules: Optional[Union[list[str], str]] = None exclude_modules: Optional[Union[list[str], str]] = None lora_alpha: int = 8 lora_dropout: float = 0.0 fan_in_fan_out: bool = False bias: Literal['none', 'all', 'lora_only'] = 'none' use_rslora: bool = False modules_to_save: Optional[list[str]] = None init_lora_weights: bool | Literal['gaussian', 'eva', 'olora', 'pissa', 'pissa_niter_[number of iters]', 'corda', 'loftq'] = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None rank_pattern: typing.Optional[dict] = None alpha_pattern: Optional[dict] = <factory> megatron_config: Optional[dict] = None megatron_core: Optional[str] = 'megatron.core' trainable_token_indices: Optional[Union[list[int], dict[str, list[int]]]] = None loftq_config: Union[LoftQConfig, dict] = <factory> eva_config: Optional[EvaConfig] = None corda_config: Optional[CordaConfig] = None use_dora: bool = False layer_replication: Optional[list[tuple[int, int]]] = None runtime_config: LoraRuntimeConfig = <factory> lora_bias: bool = False target_r: int = 8 init_r: int = 12 tinit: int = 0 tfinal: int = 0 deltaT: int = 1 beta1: float = 0.85 beta2: float = 0.85 orth_reg_weight: float = 0.5 total_step: typing.Optional[int] = None )

Parameters

target_r (int) — The target average rank of incremental matrix.
init_r (int) — The initial rank for each incremental matrix.
tinit (int) — The steps of initial fine-tuning warmup.
tfinal (int) — The number of steps of final fine-tuning.
deltaT (int) — The time internval between two budget allocations.
beta1 (float) — The hyperparameter of EMA for sensitivity smoothing.
beta2 (float) — The hyperparameter of EMA for undertainty quantification.
orth_reg_weight (float) — The coefficient of orthogonal regularization.
total_step (int) — The total training steps that should be specified before training.
rank_pattern (list) — The allocated rank for each weight matrix by RankAllocator.

This is the configuration class to store the configuration of a ~peft.AdaLora.

AdaLoRA has three phases defined by tinit, tfinal and total_step.

The initial phase can be understood as a step for pre-training the adapters so that when reducing their rank, there is already some information encoded that can be reduced instead of random matrices. This phase is defined by supplying tinit.

After the initial phase is over (tinit steps have passed) and the final phase has not begun, AdaLoRA reduces the budget of how much rank each layer is allowed to have with each step. This is where the reduction of rank is happening. This goes on until total_step - tfinal steps are reached.

The last phase, beginning once total_step - tfinal steps are reached, does not change the layer ranks anymore but fine-tunes the reduced-rank layers that resulted from the previous phase.

A practical example: tinit is 10, tfinal is 20, total_step is 100. We spend 10 steps doing pre-training without rank reduction because our budget is constant (init phase), then we spend 80 (100-20) steps in the reduction phase where our budget decreases step-wise and, finally, 20 steps in the final fine-tuning stage without reduction.

AdaLoraModel

class peft.AdaLoraModel

< source >

( model config adapter_name ) → torch.nn.Module

Parameters

model ([transformers.PreTrainedModel]) — The model to be adapted.
config ([AdaLoraConfig]) — The configuration of the AdaLora model.
adapter_name (str) — The name of the adapter, defaults to “default”.
low_cpu_mem_usage (bool, optional, defaults to False) — Create empty adapter weights on meta device. Useful to speed up the loading process.

Returns

torch.nn.Module

The AdaLora model.

Creates AdaLoRA (Adaptive LoRA) model from a pretrained transformers model. Paper: https://openreview.net/forum?id=lq62uWRJjiY

Example:

>>> from transformers import AutoModelForSeq2SeqLM >>> from peft import LoraConfig, AdaLoraModel, AdaLoraConfig
>>> config = AdaLoraConfig(
peft_type="ADALORA", task_type="SEQ_2_SEQ_LM", init_r=12, lora_alpha=32, target_modules=["q", "v"],
lora_dropout=0.01,
)
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base") >>> model = AdaLoraModel(model, config, "default")

Attributes:

model ([transformers.PreTrainedModel]) — The model to be adapted.
peft_config ([AdaLoraConfig]): The configuration of the AdaLora model.

add_weighted_adapter

< source >

( *args **kwargs )

This method is not supported for AdaLoRA, use LoRA instead.

update_and_allocate

< source >

( global_step )

Parameters

global_step (int) — The current training step, it is used to calculate adalora budget.

This method updates Adalora budget and mask.

This should be called in every training step after loss.backward() and before zero_grad().

tinit, tfinal and deltaT are handled with in the method.

Example:

>>> loss = model(**input).loss
>>> loss.backward()
>>> optimizer.step()
>>> model.base_model.update_and_allocate(i_step)
>>> optimizer.zero_grad()

< > Update on GitHub

PEFT

AdaLoRA

AdaLoraConfig

class peft.AdaLoraConfig

AdaLoraModel

class peft.AdaLoraModel

add_weighted_adapter

update_and_allocate