The CVPR Survival Guide: Discovering Research That's Interesting to YOU!

Community Article Published June 14, 2024

The 2024 Conference on Computer Vision and Pattern Recognition (CVPR) received 11,532 valid paper submissions, and only 2,719 were accepted for an overall acceptance rate of about 23.6%.

But keeping up with the vast array of research being presented at this year's CVPR can be challenging. CVPR has an awesome website listing out all the paper, but the information I want is scattered across various links and platforms. Needless to say, getting a good idea of what's being presented is time-consuming (and a bit disorganized).

But what if you could access all this knowledge in one convenient location, allowing you to easily identify trends and gain valuable insights?

Well, I curated a dataset, hosted on Hugging Face and built with FiftyOne, that does just that – help you explore this year's conference offerings. I was able to find/scrape 2,389 of the 2,719 accepted papers and I put them into a dataset that we're going to explore together!

Btw, this post is available as a Google Colab notebook here

🧐 What's in this dataset?

The dataset consists of images of first pages of papers, their titles, list of authors, their abstracts, direct links to papers on arXiv, project pages, a category breakdown according to the arXiv taxonomy, and keywords that I bucketed from the 2024 CVPR call for papers.

Here are the fields:

  • An image of the first page of the paper

  • title: The title of the paper

  • authors_list: The list of authors

  • abstract: The abstract of the paper

  • arxiv_link: Link to the paper on arXiv

  • other_link: Link to the project page, if found

  • category_name: The primary category this paper according to arXiv taxonomy

  • all_categories: All categories this paper falls into, according to arXiv taxonomy

  • keywords: Extracted using GPT-4o

This should give us enough information to pick up on some interesting trends for this years CVPR!

PS: You can check out my picks for awesome papers at CVPR in my GitHub repo. Here's some general code for how I scraped the CVPR data.

Let's start by installing some dependencies:

%%capture
!pip install fiftyone sentence-transformers umap-learn lancedb scikit-learn==1.4.2

This tutorial will make use of the clustering plugin. Checkout all available plugins here.

!fiftyone plugins download https://github.com/jacobmarks/clustering-plugin
import fiftyone as fo
import fiftyone.utils.huggingface as fouh

FiftyOne integrates natively with Hugging Face's datasets library.

You can easily load, fine-tune, and run inference with Transformers models on FiftyOne datasets. It also integrates with the Hugging Face Hub, which allows you to push datasets to and load datasets from the Hub. It's a nice integration that simplifies sharing datasets with the machine learning community and accessing popular vision datasets. You can load datasets from specific revisions, handle multiple media fields, and configure advanced settings through the integration - check out the Hugging Face organization page here to see what datasets are available.

I've posted the dataset on Hugging Face - feel free to smash a like on it to help spread the word - and you can access it as follows:

dataset = fouh.load_from_hub("Voxel51/CVPR_2024_Papers")

You've now loaded the dataset into FiftyOne format!

The FiftyOne dataset object is a core component of the FiftyOne library. It's the central hub for managing and interacting with datasets in the FiftyOne ecosystem. It gives you a high-level interface for performing various dataset-related tasks, such as loading data, applying transformations, evaluating models, and exporting datasets in different formats.

The dataset object represents a collection of samples along with fields (associated metadata, labels, and other annotations). The dataset object provides a convenient way to store, manipulate, and query datasets in FiftyOne.

Some key features of the FiftyOne dataset object include:

  1. Support for various data types: The dataset object can handle different types of data, such as images, videos, and associated annotations like bounding boxes, segmentation masks, arbitrary text, and classification labels.

  2. Flexible metadata: Each sample in the dataset can have associated metadata, which can be used to store additional information about the sample, such as its source, attributes, or custom fields.

  3. Powerful querying: FiftyOne provides a query language that allows you to filter and select subsets of samples based on their metadata, labels, or other criteria.

  4. Visualization: The dataset object integrates with FiftyOne's visualization tools, enabling you to visualize samples, annotations, and model predictions directly in the browser.

Take a look at the app below.

With it you can get insight into the distribution of keywords, categories, and the number of papers a given author (or, at least, someone with that name) has attributed to them at this years conference!

<video controls autoplay src="

">

session = fo.launch_app(dataset, auto=False)
session.show()

You can do more interesting analysis from here. Start by getting embeddings for the title and abstract of each paper. For that, you can make use of gte-large-en-v1.5. It's small, it's fast, and it's good.

Of course, feel free to choose any model you'd like.

%%capture
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    'Alibaba-NLP/gte-large-en-v1.5', 
    trust_remote_code=True
)

The code below will helpgenerate and add text embeddings to a FiftyOne dataset.

  1. get_text_embeddings(dataset, field, model):

    • This function takes a FiftyOne dataset, a field name containing text data, and a pre-trained embedding model.

    • It retrieves the text data from the specified field of the dataset.

    • It generates embeddings for each text using the provided embedding model.

    • It returns a list of embeddings.

  2. add_embeddings_to_dataset(dataset, field, embeddings):

    • This function takes a FiftyOne dataset, a field name to store the embeddings, and a list of embeddings.

    • It adds a new sample field to the dataset to store the embeddings.

    • It sets the values of the newly added field to the provided embeddings.

Basically, it will:

  1. Extract text data from a specific field in a FiftyOne dataset.

  2. Generate embeddings for each text using a pre-trained embedding model.

  3. Add the generated embeddings back to the dataset as a new field.

def get_text_embeddings(dataset, field, model):
  """
  Returns the embeddings of the abstracts in the dataset.

  Args:
    dataset: A FiftyOne dataset object.

  Returns:
    A list of embeddings.
  """
  texts = dataset.values(field)
  text_embeddings = []
  for text in texts:
      embeddings = model.encode(text)
      text_embeddings.append(embeddings)
  return text_embeddings


def add_embeddings_to_dataset(dataset, field, embeddings):
  """
  Adds the embeddings to the dataset.

  Args:
    dataset: A FiftyOne dataset object.
    embeddings: A list of embeddings.
  """
  dataset.add_sample_field(field, fo.VectorField)
  dataset.set_values(field, embeddings)

Now, add the embeddings to the dataset!

abstract_embeddings = get_text_embeddings(
    dataset = dataset, 
    field = "abstract", 
    model = model
)

add_embeddings_to_dataset(
    dataset=dataset, 
    field="abstract_embeddings", 
    embeddings=abstract_embeddings
)

title_embeddings = get_text_embeddings(
    dataset = dataset, 
    field = "title", 
    model = model
)

add_embeddings_to_dataset(
    dataset=dataset, 
    field="title_embeddings", 
    embeddings=title_embeddings
)

Making use of the embeddings

You can use FiftyOne Brain to do some cool stuff with embeddings, like:

  • Visualizing datasets in low-dimensional embedding spaces to reveal patterns and clusters that can help identify failure modes, critical scenarios, and recommend new samples to add to your training set

  • Computing uniqueness scores for images to identify the most unique samples that are vital for efficient model training in the early stages of the machine learning workflow

  • Indexing datasets by similarity to easily find similar examples to images or objects of interest, which is useful for diagnosing model issues or mining for additional training data

Visualizing embeddings

Below are the supported dimensionality reduction methods in the Brain:

UMAP (Uniform Manifold Approximation and Projection)

UMAP is a dimensionality reduction technique that uses applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data.

It is particularly well-suited for text embeddings because it can handle high-dimensional data and preserve the global structure of the data, making it useful for both visualization and preprocessing for clustering algorithms.

t-SNE (t-distributed Stochastic Neighbor Embedding)

t-SNE is a non-linear dimensionality reduction technique that is also used for visualizing high-dimensional data. It is similar to UMAP but tends to be slower and less scalable.

While it can be effective for certain types of data, it may not perform as well as UMAP for large datasets.

PCA (Principal Component Analysis)

PCA is a linear dimensionality reduction technique that projects high-dimensional data onto lower-dimensional subspaces. It is fast and easy to implement but may not capture non-linear relationships in the data as effectively as UMAP or t-SNE.

PCA is often used for simpler data sets where linearity is a reasonable assumption.

Manual

Manually computing a low-dimensional representation involves creating a custom method to reduce the dimensionality of the data. This approach can be time-consuming and requires a deep understanding of the data and the desired outcome.

import fiftyone.brain as fob

fob.compute_visualization(
    dataset,
    embeddings="abstract_embeddings",
    num_dims=2,
    method="umap",
    brain_key="umap_abstract",
    verbose=True,
    seed=51
)

fob.compute_visualization(
    dataset,
    embeddings="title_embeddings",
    num_dims=2,
    method="umap",
    brain_key="umap_title",
    verbose=True,
    seed=51
)

Computing uniqueness

The code below adds a uniqueness field to each sample scoring how unique it is with respect to the rest of the samples. This is interesting because you can understand what are the most unique papers (based on their abstracts) among the all the papers in the dataset.

fob.compute_uniqueness(
    dataset,
    embeddings="abstract_embeddings",
    uniqueness_field="uniqueness_abstract",
)

fob.compute_uniqueness(
    dataset,
    embeddings="title_embeddings",
    uniqueness_field="uniqueness_title",
)

Computing similarity

The code below will index the abstract embeddings by similarity and you can easily query and sort your datasets to find similar examples. Once you’ve indexed a dataset by similarity, you can use the sort_by_similarity() view stage to programmatically sort the dataset by abstract similarity! The code below is using LanceDB as the back end(read about the integration here) but there at the moment there several backends you can use:

sklearn (default): a scikit-learn backend

qdrant: a Qdrant backend

redis: a Redis backend

pinecone: a Pinecone backend

mongodb: a MongoDB backend

milvus: a Milvus backend

The library is open source and we welcome contributions, feel free to contribute an integration to your favorite vector database.

sim_abstract = fob.compute_similarity(
    dataset,
    embeddings="abstract_embeddings",
    brain_key="abstract_similarity",
    backend="lancedb",
)

🔊 Now, let's check all this out in the app! Turn your audio on because I'll explain what I'm doing!

<video controls autoplay src="

">

There's a lot more that you can do with FiftyOne, more than I can share in this note-blog. But, I hope you'll join me for a workshop where I'll spend ~90 minutes teaching you how to use FiftyOne! Sign up here!