Distortion#

repliclust.distortion#

Provides functions for nonlinearly distorting datasets, so that clusters become non-convex and take on more irregular shapes beyond ellipsoids.

Functions:

distort(): Distort a dataset.
project_to_sphere(): Apply stereographic projection to make the data directional.

class repliclust.distortion.NeuralNetwork(hidden_dim=64, dim=2, n_layers=50)#

Bases: Module

Random neural network for data distortion.

This neural network applies a series of linear and non-linear transformations to input data, intended to distort datasets and make clusters take on irregular shapes.

Parameters:

hidden_dim (int, optional) – The dimensionality of the hidden layers. Defaults to 64.
dim (int, optional) – The input and output dimensionality of the data. Defaults to 2.
n_layers (int, optional) – The number of hidden layers in the network. Defaults to 50.

forward(x)#

Forward pass of the neural network.

Parameters:: x (torch.Tensor) – Input tensor of shape (n_samples, dim).
Returns:: Transformed tensor of shape (n_samples, dim).
Return type:: torch.Tensor

training: bool#

repliclust.distortion.construct_near_ortho_matrix(hidden_dim, scaling_factor=0.1)#

Construct a near-orthogonal matrix.

Generates a random near-orthogonal matrix of size hidden_dim x hidden_dim by perturbing an orthogonal matrix.

Parameters:

hidden_dim (int) – The dimension of the square matrix to generate.
scaling_factor (float, optional) – Standard deviation of the normal distribution used to generate logarithms of scaling factors. Defaults to 0.1.

Returns:

A hidden_dim x hidden_dim near-orthogonal matrix of type torch.float32.

Return type:

torch.Tensor

Notes

The generated matrix is obtained by scaling the eigenvalues of a random orthogonal matrix, ensuring the determinant remains ±1.

repliclust.distortion.distort(X, hidden_dim=128, n_layers=16, device='cuda', set_seed=None)#

Distort a dataset using a random neural network.

Transforms the input dataset X by passing it through a randomly initialized neural network, causing clusters to take on irregular, non-convex shapes.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input data to be distorted.
hidden_dim (int, optional) – The dimensionality of the hidden layers in the neural network. Defaults to 128.
n_layers (int, optional) – The number of hidden layers in the neural network. Defaults to 16.
device ({'cuda', 'cpu'}, optional) – The device on which to perform computations. Defaults to ‘cuda’.
set_seed (int or None, optional) – Random seed for reproducibility. If None, the random seed is not set.

Returns:

The distorted data as a tensor of shape (n_samples, n_features).

Return type:

torch.Tensor

Notes

If CUDA is not available, the device will be automatically switched to ‘cpu’.

Examples

Distort a dataset and convert the result to a NumPy array:

>>> X_distorted = distort(X).numpy()

repliclust.distortion.wrap_around_sphere(X)#

Apply inverse stereographic projection to data.

Projects the input data X onto the unit sphere using inverse stereographic projection, making the data directional.

Parameters:: X (array-like of shape (n_samples, n_features)) – The input data to be projected.
Returns:: The projected data lying on the unit sphere.
Return type:: ndarray of shape (n_samples, n_features + 1)

Notes

The inverse stereographic projection maps points from Euclidean space onto the sphere. The output data will have one additional dimension compared to the input.

Examples

Project a 2D dataset onto the sphere:

>>> X_spherical = wrap_around_sphere(X)

Verify that the projected points lie on the unit sphere:

>>> np.allclose(np.linalg.norm(X_spherical, axis=1), 1.0)
True