apollon.som.utilities module¶

apollon/som/utilites.py

Utilities for self.organizing maps.

apollon.som.utilities.best_match(weights: numpy.ndarray, inp: numpy.ndarray, metric: str)¶

Compute the best matching unit of weights for each element in inp.

If several elemets in weights have the same distance to the current element of inp, the first element of weights is choosen to be the best matching unit.

Parameters

weights – Two-dimensional array of weights, in which each row represents an unit.
inp – Array of test vectors. If two-dimensional, rows are assumed to represent observations.
metric – Distance metric to use.

Returns

Index and error of best matching units.

apollon.som.utilities.decrease_expo(start: float, step: float, stop: float = 1.0) → Iterator[float]¶: Exponentially decrease start in step steps to stop.

apollon.som.utilities.decrease_linear(start: float, step: float, stop: float = 1.0) → Iterator[float]¶: Linearily decrease start in step steps to stop.

apollon.som.utilities.distribute(bmu_idx: Iterable[int], n_units: int) → Dict[int, List[int]]¶

List training data matches per SOM unit.

This method assumes that the ith element of bmu_idx corresponds to the ith vetor in a array of input data vectors.

Empty units result in empty list.

Parameters

bmu_idx – Indices of best matching units.
n_units – Number of units on the SOM.

Returns

Dictionary in which the keys represent the flat indices of SOM units. The corresponding value is a list of indices of those training data vectors that have been mapped to this unit.

apollon.som.utilities.grid(n_rows: int, n_cols: int) → numpy.ndarray¶

Compute grid indices of a two-dimensional array.

Parameters

n_rows – Number of array rows.
n_cols – Number of array columns.

Returns

Two-dimensional array in which each row represents an multi-index.

apollon.som.utilities.grid_iter(n_rows: int, n_cols: int) → Iterator[Tuple[int, int]]¶

Compute grid indices of an two-dimensional array.

Parameters

n_rows – Number of array rows.
n_cols – Number of array columns.

Returns

Multi-index iterator.

apollon.som.utilities.sample_hist(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶

Sample sum-normalized histograms.

Parameters

dims – Dimensions of SOM.
data – Input data set.

Returns

Two-dimensional array in which each row is a historgram.

apollon.som.utilities.sample_pca(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶

Compute initial SOM weights by sampling from the first two principal components of the input data.

Parameters

dims – Dimensions of SOM.
data – Input data set.
adapt – If True, the largest value of shape is applied to the principal component with the largest sigular value. This orients the map, such that map dimension with the most units coincides with principal component with the largest variance.

Returns

Array of SOM weights.

apollon.som.utilities.sample_rnd(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶

Compute initial SOM weights by sampling uniformly from the data space.

Parameters

dims – Dimensions of SOM.
data – Input data set. If None, sample from [-10, 10].

Returns

Array of SOM weights.

apollon.som.utilities.sample_stm(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶

Compute initial SOM weights by sampling stochastic matrices from Dirichlet distribution.

The rows of each n by n stochastic matrix are sampes drawn from the Dirichlet distribution, where n is the number of rows and cols of the matrix. The diagonal elemets of the matrices are set to twice the probability of the remaining elements. The square root of the weight vectors’ size must be a real integer.

Parameters

dims – Dimensions of SOM.
data – Input data set.

Returns

Array of SOM weights.

Notes

Each row of the output array is to be considered a flattened stochastic matrix, such that each N = sqrt(data.shape[1]) values are a discrete probability distribution forming the N th row of the matrix.