--- license: other license_name: custom-apple-license license_link: https://github.com/apple/ml-tic-clip/blob/main/LICENSE tags: - vision - zero-shot-image-classification datasets: - apple/TiC-DataComp library_name: tic-clip --- # Model Card for TiC-CLIP-bestpool-oracle This repository contains TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) with data from 2014 to 2022 using our modified OpenCLIP code. For additional information refer to our [GitHub repo](https://github.com/apple/ml-tic-clip). ## Model Details ### Model Description Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs spanning 9 years (2014-2022). We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses ≈8% zero-shot accuracy on our curated retrieval task from 2021-2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by 2.5× when compared to the standard practice of retraining from scratch. Code is available at [this https URL](https://github.com/apple/ml-tic-clip). - **Developed by:** Apple - **License:** See [LICENSE](https://github.com/apple/ml-tic-clip/blob/main/LICENSE) ### Model Sources [optional] - **Repository:** [ml-tic-clip GitHub repo](https://github.com/apple/ml-tic-clip) - **Paper:** [TiC-CLIP: Continual Training of CLIP Models, Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F., International Conference on Learning Representations (ICLR), 2024.](https://arxiv.org/abs/2310.16226) ## Uses Researchers can use TiC-CLIP pretrained models for faster design of continual learning methods by start from a pretrained checkpoint and continually train on the next year or next month data. ## How to Get Started with the Model The models are compatible with DataComp evaluation suite and our patched version of DataComp for evaluation on TiC-DataComp-Retrieval and TiC-DataCompNet. The models can also be used to resume a training or as initialization for new training using OpenCLIP code. Please follow instructions in our [GitHub repo](https://github.com/apple/ml-tic-clip) to create the evaluation sets or follow [DataComp](https://github.com/mlfoundations/datacomp) for the standard evaluations on 38 datasets. The following snippet assumes the TiC-DataComp data has been prepared and following the instructions in the GitHub repo. ### Training ```bash YEAR=2016 # There are no models before 2016 since data from 2014-2016 were compined into one year REPO="apple/TiC-CLIP-bestpool-oracle" huggingface-cli download $REPO checkpoints/$YEAR.pt ## Train Cummulative pushd datacomp final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/ torchrun --nproc_per_node 8 --nnodes 1 \ train.py \ --scale "tic_medium" \ --dataset_resampled \ --data_dir $final_data_dir \ --output_dir "./results/" \ --exp_name "datacomp_medium-basic_cumulative" \ --imagenet_val $IMAGENET_VAL_PATH \ --save_frequency 1 \ --resume popd ``` ### Evaluation ```bash ## Evaluate Model # Evaluate a ViT-B/16 model on TiC/Retrieval/Yearly/$YEAR and # TiC/DataCompNet/Yearly/$YEAR pushd datacomp python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification ``` ### OpenCLIP Load and Inference Example ```python import open_clip from huggingface_hub import hf_hub_download filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-cumulative", filename="checkpoints/2016.pt") model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename) tokenizer = open_clip.get_tokenizer('ViT-B-16') image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0) text = tokenizer(["a diagram", "a dog", "a cat"]) with torch.no_grad(), torch.cuda.amp.autocast(): image_features = model.encode_image(image) text_features = model.encode_text(text) image_features /= image_features.norm(dim=-1, keepdim=True) text_features /= text_features.norm(dim=-1, keepdim=True) text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1) print("Label probs:", text_probs) ``` ## Training Details ### Training Data Please refer to [TiC-DataComp](https://huggingface.co/datasets/apple/TiC-DataComp). ### Training Procedure Please refer to Sections 2-3 of our [TiC-CLIP](https://github.com/apple/ml-tic-clip) paper. ## Citation **[TiC-CLIP: Continual Training of CLIP Models](https://arxiv.org/abs/2310.16226). (ICLR 2024)** *Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F..* ```bibtex @inproceedings{garg2024tic, title={TiC-CLIP: Continual Training of CLIP Models}, author={Garg, Saurabh and Farajtabar, Mehrdad and Pouransari, Hadi and Vemulapalli, Raviteja and Mehta, Sachin and Tuzel, Oncel and Shankar, Vaishaal and Faghri, Fartash}, booktitle={The Twelfth International Conference on Learning Representations (ICLR)}, year={2024}, url={https://openreview.net/forum?id=TLADT8Wrhn} }