apollon.som.utilities module

apollon/som/utilites.py

Utilities for self.organizing maps.

Licensed under the terms of the BSD-3-Clause license. Copyright (C) 2019 Michael Blaß mblass@posteo.net

apollon.som.utilities.best_match(weights: numpy.ndarray, inp: numpy.ndarray, metric: str)

Compute the best matching unit of weights for each element in inp.

If several elemets in weights have the same distance to the current element of inp, the first element of weights is choosen to be the best matching unit.

Parameters
  • weights – Two-dimensional array of weights, in which each row represents an unit.

  • inp – Array of test vectors. If two-dimensional, rows are assumed to represent observations.

  • metric – Distance metric to use.

Returns

Index and error of best matching units.

apollon.som.utilities.decrease_expo(start: float, step: float, stop: float = 1.0)Iterator[float]

Exponentially decrease start in step steps to stop.

apollon.som.utilities.decrease_linear(start: float, step: float, stop: float = 1.0)Iterator[float]

Linearily decrease start in step steps to stop.

apollon.som.utilities.distribute(bmu_idx: Iterable[int], n_units: int)Dict[int, List[int]]

List training data matches per SOM unit.

This method assumes that the ith element of bmu_idx corresponds to the ith vetor in a array of input data vectors.

Empty units result in empty list.

Parameters
  • bmu_idx – Indices of best matching units.

  • n_units – Number of units on the SOM.

Returns

Dictionary in which the keys represent the flat indices of SOM units. The corresponding value is a list of indices of those training data vectors that have been mapped to this unit.

apollon.som.utilities.grid(n_rows: int, n_cols: int)numpy.ndarray

Compute grid indices of a two-dimensional array.

Parameters
  • n_rows – Number of array rows.

  • n_cols – Number of array columns.

Returns

Two-dimensional array in which each row represents an multi-index.

apollon.som.utilities.grid_iter(n_rows: int, n_cols: int)Iterator[Tuple[int, int]]

Compute grid indices of an two-dimensional array.

Parameters
  • n_rows – Number of array rows.

  • n_cols – Number of array columns.

Returns

Multi-index iterator.

apollon.som.utilities.sample_hist(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs)numpy.ndarray

Sample sum-normalized histograms.

Parameters
  • dims – Dimensions of SOM.

  • data – Input data set.

Returns

Two-dimensional array in which each row is a historgram.

apollon.som.utilities.sample_pca(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs)numpy.ndarray

Compute initial SOM weights by sampling from the first two principal components of the input data.

Parameters
  • dims – Dimensions of SOM.

  • data – Input data set.

  • adapt – If True, the largest value of shape is applied to the principal component with the largest sigular value. This orients the map, such that map dimension with the most units coincides with principal component with the largest variance.

Returns

Array of SOM weights.

apollon.som.utilities.sample_rnd(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs)numpy.ndarray

Compute initial SOM weights by sampling uniformly from the data space.

Parameters
  • dims – Dimensions of SOM.

  • data – Input data set. If None, sample from [-10, 10].

Returns

Array of SOM weights.

apollon.som.utilities.sample_stm(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs)numpy.ndarray

Compute initial SOM weights by sampling stochastic matrices from Dirichlet distribution.

The rows of each n by n stochastic matrix are sampes drawn from the Dirichlet distribution, where n is the number of rows and cols of the matrix. The diagonal elemets of the matrices are set to twice the probability of the remaining elements. The square root of the weight vectors’ size must be a real integer.

Parameters
  • dims – Dimensions of SOM.

  • data – Input data set.

Returns

Array of SOM weights.

Notes

Each row of the output array is to be considered a flattened stochastic matrix, such that each N = sqrt(data.shape[1]) values are a discrete probability distribution forming the N th row of the matrix.