arxiv:2307.02040

VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks

Published on Jul 5, 2023

Authors:

Abstract

Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from a global set, which only capture a subset of feature distributions, leading to inadequate algorithm performance assessment. This paper addresses these shortcomings by introducing two key factors affecting VFL performance - feature importance and feature correlation - and proposing associated evaluation metrics and dataset splitting methods. Additionally, we introduce a real VFL dataset to address the deficit in image-image VFL scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides valuable insights for future research in the field.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2307.02040 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2307.02040 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2307.02040 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.