Cluster Overlap Control#

repliclust.overlap.centers#

This module implements a ClusterCenterSampler based on achieving the desired degree of pairwise overlap between clusters by minimizing an objective function.

class repliclust.overlap.centers.ConstrainedOverlapCenters(max_overlap=0.1, min_overlap=0.09, packing=0.1, learning_rate=0.1, linear_penalty_weight=0.5, overlap_mode='lda', max_epoch=100, n_restarts=3, ATOL=1e-10, RTOL=0.001)#

Bases: ClusterCenterSampler

This class provides an implementation for optimizing the location of cluster centers to achieve the desired degrees of overlap between pairs of clusters.

Parameters:
  • max_overlap (float between 0 and 1) – The maximum allowed overlap between two cluster centers, measured as a fraction of cluster mass.

  • min_overlap (float) – Minimum degree of overlap each cluster should have with its closest neighbor, measured as a fraction of cluster mass.

  • packing (float) – Sets the ratio of total cluster volume to the sampling volume. Used when choosing random cluster centers for initializing the optimization.

  • learning_rate (float) – The rate at which cluster centers are optimized. If numerical instabilities appear, it is recommended to lower this number.

  • linear_penalty_weight (float) – The weight for the linear penalty in the overlap loss. If zero, the overlap loss carries only a quadratic penalty and minimization cannot make the overlap loss vanish exactly; in this case, minimization stops when the overlap loss is almost zero (within tolerance ATOL).

  • overlap_mode ({'lda', 'c2c', 'exact'}) – Method for calculating cluster overlap.

  • max_epoch (int) – The maximum number of optimization epochs to run. Increasing this number may slow down the optimization.

  • n_restarts (int) – Number of times to repeat the optimization, each time with a different random initialization. The final result is the best result attained among the n_restarts runs.

  • ATOL (float) – Absolute numerical tolerance for optimization.

  • RTOL (float) – Relative numerical tolerance for optimization.

sample_cluster_centers(archetype, quiet=False)#

Sample cluster centers for the given archetype.

repliclust.overlap.centers.assess_obs_overlap(centers, cov_list, ave_cov_inv_list, mode='lda')#

Compute the observed minimum and maximum overlap between cluster centers.

Parameters:
  • centers (ndarray) – The cluster centers arranged as a matrix. Each row is a center.

  • cov_list (list[ndarray]) – A list whose i-th entry is the covariance matrix of the i-th cluster.

  • ave_cov_inv_list (list[ndarray]) – A list with k*(k-1)/2 entries that gives the inverses of the average covariance matrices between each distinct pair of clusters.

  • mode ({'lda', 'c2c'}) – The method for calculating cluster overlap.

repliclust.overlap.centers.overlap2quantile_vec(overlaps)#

Convert overlaps to the corresponding quantiles.

repliclust.overlap.centers.quantile2overlap_vec(quantiles)#

Convert quantiles to the corresponding overlaps.