Cluster Overlap Control#
repliclust.overlap.centers#
This module implements a ClusterCenterSampler based on achieving the desired degree of pairwise overlap between clusters by minimizing an objective function.
- class repliclust.overlap.centers.ConstrainedOverlapCenters(max_overlap=0.1, min_overlap=0.09, packing=0.1, learning_rate=0.1, linear_penalty_weight=0.5, overlap_mode='lda', max_epoch=100, n_restarts=3, ATOL=1e-10, RTOL=0.001)#
Bases:
ClusterCenterSampler
This class provides an implementation for optimizing the location of cluster centers to achieve the desired degrees of overlap between pairs of clusters.
- Parameters:
max_overlap (float between 0 and 1) – The maximum allowed overlap between two cluster centers, measured as a fraction of cluster mass.
min_overlap (float) – Minimum degree of overlap each cluster should have with its closest neighbor, measured as a fraction of cluster mass.
packing (float) – Sets the ratio of total cluster volume to the sampling volume. Used when choosing random cluster centers for initializing the optimization.
learning_rate (float) – The rate at which cluster centers are optimized. If numerical instabilities appear, it is recommended to lower this number.
linear_penalty_weight (float) – The weight for the linear penalty in the overlap loss. If zero, the overlap loss carries only a quadratic penalty and minimization cannot make the overlap loss vanish exactly; in this case, minimization stops when the overlap loss is almost zero (within tolerance ATOL).
overlap_mode ({'lda', 'c2c', 'exact'}) – Method for calculating cluster overlap.
max_epoch (int) – The maximum number of optimization epochs to run. Increasing this number may slow down the optimization.
n_restarts (int) – Number of times to repeat the optimization, each time with a different random initialization. The final result is the best result attained among the n_restarts runs.
ATOL (float) – Absolute numerical tolerance for optimization.
RTOL (float) – Relative numerical tolerance for optimization.
- sample_cluster_centers(archetype, quiet=False)#
Sample cluster centers for the given archetype.
- repliclust.overlap.centers.assess_obs_overlap(centers, cov_list, ave_cov_inv_list, mode='lda')#
Compute the observed minimum and maximum overlap between cluster centers.
- Parameters:
centers (ndarray) – The cluster centers arranged as a matrix. Each row is a center.
cov_list (list[ndarray]) – A list whose i-th entry is the covariance matrix of the i-th cluster.
ave_cov_inv_list (list[ndarray]) – A list with k*(k-1)/2 entries that gives the inverses of the average covariance matrices between each distinct pair of clusters.
mode ({'lda', 'c2c'}) – The method for calculating cluster overlap.
- repliclust.overlap.centers.overlap2quantile_vec(overlaps)#
Convert overlaps to the corresponding quantiles.
- repliclust.overlap.centers.quantile2overlap_vec(quantiles)#
Convert quantiles to the corresponding overlaps.