Distortion#
repliclust.distortion#
Provides functions for nonlinearly distorting datasets, so that clusters become non-convex and take on more irregular shapes beyond ellipsoids.
- Functions:
distort()
Distort a dataset.
project_to_sphere()
Apply stereographic projection to make the data directional.
- class repliclust.distortion.NeuralNetwork(hidden_dim=64, dim=2, n_layers=50)#
Bases:
Module
Random neural network for data distortion.
This neural network applies a series of linear and non-linear transformations to input data, intended to distort datasets and make clusters take on irregular shapes.
- Parameters:
hidden_dim (int, optional) – The dimensionality of the hidden layers. Defaults to 64.
dim (int, optional) – The input and output dimensionality of the data. Defaults to 2.
n_layers (int, optional) – The number of hidden layers in the network. Defaults to 50.
- forward(x)#
Forward pass of the neural network.
- Parameters:
x (torch.Tensor) – Input tensor of shape (n_samples, dim).
- Returns:
Transformed tensor of shape (n_samples, dim).
- Return type:
torch.Tensor
- training: bool#
- repliclust.distortion.construct_near_ortho_matrix(hidden_dim, scaling_factor=0.1)#
Construct a near-orthogonal matrix.
Generates a random near-orthogonal matrix of size hidden_dim x hidden_dim by perturbing an orthogonal matrix.
- Parameters:
hidden_dim (int) – The dimension of the square matrix to generate.
scaling_factor (float, optional) – Standard deviation of the normal distribution used to generate logarithms of scaling factors. Defaults to 0.1.
- Returns:
A hidden_dim x hidden_dim near-orthogonal matrix of type torch.float32.
- Return type:
torch.Tensor
Notes
The generated matrix is obtained by scaling the eigenvalues of a random orthogonal matrix, ensuring the determinant remains ±1.
- repliclust.distortion.distort(X, hidden_dim=128, n_layers=16, device='cuda', set_seed=None)#
Distort a dataset using a random neural network.
Transforms the input dataset X by passing it through a randomly initialized neural network, causing clusters to take on irregular, non-convex shapes.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data to be distorted.
hidden_dim (int, optional) – The dimensionality of the hidden layers in the neural network. Defaults to 128.
n_layers (int, optional) – The number of hidden layers in the neural network. Defaults to 16.
device ({'cuda', 'cpu'}, optional) – The device on which to perform computations. Defaults to ‘cuda’.
set_seed (int or None, optional) – Random seed for reproducibility. If None, the random seed is not set.
- Returns:
The distorted data as a tensor of shape (n_samples, n_features).
- Return type:
torch.Tensor
Notes
If CUDA is not available, the device will be automatically switched to ‘cpu’.
Examples
Distort a dataset and convert the result to a NumPy array:
>>> X_distorted = distort(X).numpy()
- repliclust.distortion.wrap_around_sphere(X)#
Apply inverse stereographic projection to data.
Projects the input data X onto the unit sphere using inverse stereographic projection, making the data directional.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data to be projected.
- Returns:
The projected data lying on the unit sphere.
- Return type:
ndarray of shape (n_samples, n_features + 1)
Notes
The inverse stereographic projection maps points from Euclidean space onto the sphere. The output data will have one additional dimension compared to the input.
Examples
Project a 2D dataset onto the sphere:
>>> X_spherical = wrap_around_sphere(X)
Verify that the projected points lie on the unit sphere:
>>> np.allclose(np.linalg.norm(X_spherical, axis=1), 1.0) True