Papers
arxiv:2501.12886

Multi-Platform Aggregated Dataset of Online Communities (MADOC)

Published on Jan 22
Authors:
,
,
,
,
,
,
,
,
,

Abstract

The Multi-platform Aggregated Dataset of Online Communities (MADOC) is a comprehensive dataset that facilitates computational social science research by providing FAIR-compliant standardized access to cross-platform analysis of online social dynamics. MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users. The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis. By providing UUID-anonymized user histories and temporal alignment of banned communities' activity patterns, MADOC supports research on content moderation impacts and platform migration trends. Distributed via Zenodo with persistent identifiers and Python/R toolkits, the dataset adheres to FAIR principles while addressing post-API-era research challenges through ethical aggregation of public social media archives.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.12886 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.12886 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.12886 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.