# Image classification using LoRA This guide demonstrates how to use LoRA, a low-rank approximation technique, to fine-tune an image classification model. By using LoRA from 🤗 PEFT, we can reduce the number of trainable parameters in the model to only 0.77% of the original. LoRA achieves this reduction by adding low-rank "update matrices" to specific blocks of the model, such as the attention blocks. During fine-tuning, only these matrices are trained, while the original model parameters are left unchanged. At inference time, the update matrices are merged with the original model parameters to produce the final classification result. For more information on LoRA, please refer to the [original LoRA paper](https://arxiv.org/abs/2106.09685). ## Install dependencies Install the libraries required for model training: ```bash !pip install transformers accelerate evaluate datasets loralib peft -q ``` Check the versions of all required libraries to make sure you are up to date: ```python import transformers import accelerate import peft print(f"Transformers version: {transformers.__version__}") print(f"Accelerate version: {accelerate.__version__}") print(f"PEFT version: {peft.__version__}") "Transformers version: 4.27.4" "Accelerate version: 0.18.0" "PEFT version: 0.2.0" ``` ## Authenticate to share your model To share the fine-tuned model at the end of the training with the community, authenticate using your 🤗 token. You can obtain your token from your [account settings](https://huggingface.co/settings/token). ```python from huggingface_hub import notebook_login notebook_login() ``` ## Select a model checkpoint to fine-tune Choose a model checkpoint from any of the model architectures supported for [image classification](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads). When in doubt, refer to the [image classification task guide](https://huggingface.co/docs/transformers/v4.27.2/en/tasks/image_classification) in 🤗 Transformers documentation. ```python model_checkpoint = "google/vit-base-patch16-224-in21k" ``` ## Load a dataset To keep this example's runtime short, let's only load the first 5000 instances from the training set of the [Food-101 dataset](https://huggingface.co/datasets/food101): ```python from datasets import load_dataset dataset = load_dataset("food101", split="train[:5000]") ``` ## Dataset preparation To prepare the dataset for training and evaluation, create `label2id` and `id2label` dictionaries. These will come in handy when performing inference and for metadata information: ```python labels = dataset.features["label"].names label2id, id2label = dict(), dict() for i, label in enumerate(labels): label2id[label] = i id2label[i] = label id2label[2] "baklava" ``` Next, load the image processor of the model you're fine-tuning: ```python from transformers import AutoImageProcessor image_processor = AutoImageProcessor.from_pretrained(model_checkpoint) ``` The `image_processor` contains useful information on which size the training and evaluation images should be resized to, as well as values that should be used to normalize the pixel values. Using the `image_processor`, prepare transformation functions for the datasets. These functions will include data augmentation and pixel scaling: ```python from torchvision.transforms import ( CenterCrop, Compose, Normalize, RandomHorizontalFlip, RandomResizedCrop, Resize, ToTensor, ) normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std) train_transforms = Compose( [ RandomResizedCrop(image_processor.size["height"]), RandomHorizontalFlip(), ToTensor(), normalize, ] ) val_transforms = Compose( [ Resize(image_processor.size["height"]), CenterCrop(image_processor.size["height"]), ToTensor(), normalize, ] ) def preprocess_train(example_batch): """Apply train_transforms across a batch.""" example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]] return example_batch def preprocess_val(example_batch): """Apply val_transforms across a batch.""" example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]] return example_batch ``` Split the dataset into training and validation sets: ```python splits = dataset.train_test_split(test_size=0.1) train_ds = splits["train"] val_ds = splits["test"] ``` Finally, set the transformation functions for the datasets accordingly: ```python train_ds.set_transform(preprocess_train) val_ds.set_transform(preprocess_val) ``` ## Load and prepare a model Before loading the model, let's define a helper function to check the total number of parameters a model has, as well as how many of them are trainable. ```python def print_trainable_parameters(model): trainable_params = 0 all_param = 0 for _, param in model.named_parameters(): all_param += param.numel() if param.requires_grad: trainable_params += param.numel() print( f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}" ) ``` It's important to initialize the original model correctly as it will be used as a base to create the `PeftModel` you'll actually fine-tune. Specify the `label2id` and `id2label` so that [`~transformers.AutoModelForImageClassification`] can append a classification head to the underlying model, adapted for this dataset. You should see the following output: ``` Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.weight', 'classifier.bias'] ``` ```python from transformers import AutoModelForImageClassification, TrainingArguments, Trainer model = AutoModelForImageClassification.from_pretrained( model_checkpoint, label2id=label2id, id2label=id2label, ignore_mismatched_sizes=True, # provide this in case you're planning to fine-tune an already fine-tuned checkpoint ) ``` Before creating a `PeftModel`, you can check the number of trainable parameters in the original model: ```python print_trainable_parameters(model) "trainable params: 85876325 || all params: 85876325 || trainable%: 100.00" ``` Next, use `get_peft_model` to wrap the base model so that "update" matrices are added to the respective places. ```python from peft import LoraConfig, get_peft_model config = LoraConfig( r=16, lora_alpha=16, target_modules=["query", "value"], lora_dropout=0.1, bias="none", modules_to_save=["classifier"], ) lora_model = get_peft_model(model, config) print_trainable_parameters(lora_model) "trainable params: 667493 || all params: 86466149 || trainable%: 0.77" ``` Let's unpack what's going on here. To use LoRA, you need to specify the target modules in `LoraConfig` so that `get_peft_model()` knows which modules inside our model need to be amended with LoRA matrices. In this example, we're only interested in targeting the query and value matrices of the attention blocks of the base model. Since the parameters corresponding to these matrices are "named" "query" and "value" respectively, we specify them accordingly in the `target_modules` argument of `LoraConfig`. We also specify `modules_to_save`. After wrapping the base model with `get_peft_model()` along with the `config`, we get a new model where only the LoRA parameters are trainable (so-called "update matrices") while the pre-trained parameters are kept frozen. However, we want the classifier parameters to be trained too when fine-tuning the base model on our custom dataset. To ensure that the classifier parameters are also trained, we specify `modules_to_save`. This also ensures that these modules are serialized alongside the LoRA trainable parameters when using utilities like `save_pretrained()` and `push_to_hub()`. Here's what the other parameters mean: - `r`: The dimension used by the LoRA update matrices. - `alpha`: Scaling factor. - `bias`: Specifies if the `bias` parameters should be trained. `None` denotes none of the `bias` parameters will be trained. `r` and `alpha` together control the total number of final trainable parameters when using LoRA, giving you the flexibility to balance a trade-off between end performance and compute efficiency. By looking at the number of trainable parameters, you can see how many parameters we're actually training. Since the goal is to achieve parameter-efficient fine-tuning, you should expect to see fewer trainable parameters in the `lora_model` in comparison to the original model, which is indeed the case here. ## Define training arguments For model fine-tuning, use [`~transformers.Trainer`]. It accepts several arguments which you can wrap using [`~transformers.TrainingArguments`]. ```python from transformers import TrainingArguments, Trainer model_name = model_checkpoint.split("/")[-1] batch_size = 128 args = TrainingArguments( f"{model_name}-finetuned-lora-food101", remove_unused_columns=False, evaluation_strategy="epoch", save_strategy="epoch", learning_rate=5e-3, per_device_train_batch_size=batch_size, gradient_accumulation_steps=4, per_device_eval_batch_size=batch_size, fp16=True, num_train_epochs=5, logging_steps=10, load_best_model_at_end=True, metric_for_best_model="accuracy", push_to_hub=True, label_names=["labels"], ) ``` Compared to non-PEFT methods, you can use a larger batch size since there are fewer parameters to train. You can also set a larger learning rate than the normal (1e-5 for example). This can potentially also reduce the need to conduct expensive hyperparameter tuning experiments. ## Prepare evaluation metric ```python import numpy as np import evaluate metric = evaluate.load("accuracy") def compute_metrics(eval_pred): """Computes accuracy on a batch of predictions""" predictions = np.argmax(eval_pred.predictions, axis=1) return metric.compute(predictions=predictions, references=eval_pred.label_ids) ``` The `compute_metrics` function takes a named tuple as input: `predictions`, which are the logits of the model as Numpy arrays, and `label_ids`, which are the ground-truth labels as Numpy arrays. ## Define collation function A collation function is used by [`~transformers.Trainer`] to gather a batch of training and evaluation examples and prepare them in a format that is acceptable by the underlying model. ```python import torch def collate_fn(examples): pixel_values = torch.stack([example["pixel_values"] for example in examples]) labels = torch.tensor([example["label"] for example in examples]) return {"pixel_values": pixel_values, "labels": labels} ``` ## Train and evaluate Bring everything together - model, training arguments, data, collation function, etc. Then, start the training! ```python trainer = Trainer( model, args, train_dataset=train_ds, eval_dataset=val_ds, tokenizer=image_processor, compute_metrics=compute_metrics, data_collator=collate_fn, ) train_results = trainer.train() ``` In just a few minutes, the fine-tuned model shows 96% validation accuracy even on this small subset of the training dataset. ```python trainer.evaluate(val_ds) { "eval_loss": 0.14475855231285095, "eval_accuracy": 0.96, "eval_runtime": 3.5725, "eval_samples_per_second": 139.958, "eval_steps_per_second": 1.12, "epoch": 5.0, } ``` ## Share your model and run inference Once the fine-tuning is done, share the LoRA parameters with the community like so: ```python repo_name = f"sayakpaul/{model_name}-finetuned-lora-food101" lora_model.push_to_hub(repo_name) ``` When calling [`~transformers.PreTrainedModel.push_to_hub`] on the `lora_model`, only the LoRA parameters along with any modules specified in `modules_to_save` are saved. Take a look at the [trained LoRA parameters](https://huggingface.co/sayakpaul/vit-base-patch16-224-in21k-finetuned-lora-food101/blob/main/adapter_model.bin). You'll see that it's only 2.6 MB! This greatly helps with portability, especially when using a very large model to fine-tune (such as [BLOOM](https://huggingface.co/bigscience/bloom)). Next, let's see how to load the LoRA updated parameters along with our base model for inference. When you wrap a base model with `PeftModel`, modifications are done *in-place*. To mitigate any concerns that might stem from in-place modifications, initialize the base model just like you did earlier and construct the inference model. ```python from peft import PeftConfig, PeftModel config = PeftConfig.from_pretrained(repo_name) model = AutoModelForImageClassification.from_pretrained( config.base_model_name_or_path, label2id=label2id, id2label=id2label, ignore_mismatched_sizes=True, # provide this in case you're planning to fine-tune an already fine-tuned checkpoint ) # Load the LoRA model inference_model = PeftModel.from_pretrained(model, repo_name) ``` Let's now fetch an example image for inference. ```python from PIL import Image import requests url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg" image = Image.open(requests.get(url, stream=True).raw) image ```
image of beignets
First, instantiate an `image_processor` from the underlying model repo. ```python image_processor = AutoImageProcessor.from_pretrained(repo_name) ``` Then, prepare the example for inference. ```python encoding = image_processor(image.convert("RGB"), return_tensors="pt") ``` Finally, run inference! ```python with torch.no_grad(): outputs = inference_model(**encoding) logits = outputs.logits predicted_class_idx = logits.argmax(-1).item() print("Predicted class:", inference_model.config.id2label[predicted_class_idx]) "Predicted class: beignets" ```