Synthetic Data for Cluster Analysis#
repliclust is a Python package for generating synthetic datasets with clusters. It allows you to generate many different datasets that are geometrically similar, but without ever touching low-level parameters like cluster centroids or covariance matrices.
Features#
Reproducibly generate clusters with defined geometric characteristics
Manage cluster overlaps, shapes, and probability distributions through intuitive, high-level controls
Define custom dataset archetypes to power reproducible and informative benchmarks
Reference#
Check out our paper here.