apollon.som.utilities module¶
apollon/som/utilites.py
Utilities for self.organizing maps.
Licensed under the terms of the BSD-3-Clause license. Copyright (C) 2019 Michael Blaß mblass@posteo.net
-
apollon.som.utilities.
best_match
(weights: numpy.ndarray, inp: numpy.ndarray, metric: str)¶ Compute the best matching unit of
weights
for each element ininp
.If several elemets in
weights
have the same distance to the current element ofinp
, the first element ofweights
is choosen to be the best matching unit.- Parameters
weights – Two-dimensional array of weights, in which each row represents an unit.
inp – Array of test vectors. If two-dimensional, rows are assumed to represent observations.
metric – Distance metric to use.
- Returns
Index and error of best matching units.
-
apollon.som.utilities.
decrease_expo
(start: float, step: float, stop: float = 1.0) → Iterator[float]¶ Exponentially decrease
start
instep
steps tostop
.
-
apollon.som.utilities.
decrease_linear
(start: float, step: float, stop: float = 1.0) → Iterator[float]¶ Linearily decrease
start
instep
steps tostop
.
-
apollon.som.utilities.
distribute
(bmu_idx: Iterable[int], n_units: int) → Dict[int, List[int]]¶ List training data matches per SOM unit.
This method assumes that the ith element of
bmu_idx
corresponds to the ith vetor in a array of input data vectors.Empty units result in empty list.
- Parameters
bmu_idx – Indices of best matching units.
n_units – Number of units on the SOM.
- Returns
Dictionary in which the keys represent the flat indices of SOM units. The corresponding value is a list of indices of those training data vectors that have been mapped to this unit.
-
apollon.som.utilities.
grid
(n_rows: int, n_cols: int) → numpy.ndarray¶ Compute grid indices of a two-dimensional array.
- Parameters
n_rows – Number of array rows.
n_cols – Number of array columns.
- Returns
Two-dimensional array in which each row represents an multi-index.
-
apollon.som.utilities.
grid_iter
(n_rows: int, n_cols: int) → Iterator[Tuple[int, int]]¶ Compute grid indices of an two-dimensional array.
- Parameters
n_rows – Number of array rows.
n_cols – Number of array columns.
- Returns
Multi-index iterator.
-
apollon.som.utilities.
sample_hist
(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Sample sum-normalized histograms.
- Parameters
dims – Dimensions of SOM.
data – Input data set.
- Returns
Two-dimensional array in which each row is a historgram.
-
apollon.som.utilities.
sample_pca
(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Compute initial SOM weights by sampling from the first two principal components of the input data.
- Parameters
dims – Dimensions of SOM.
data – Input data set.
adapt – If
True
, the largest value ofshape
is applied to the principal component with the largest sigular value. This orients the map, such that map dimension with the most units coincides with principal component with the largest variance.
- Returns
Array of SOM weights.
-
apollon.som.utilities.
sample_rnd
(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Compute initial SOM weights by sampling uniformly from the data space.
- Parameters
dims – Dimensions of SOM.
data – Input data set. If
None
, sample from [-10, 10].
- Returns
Array of SOM weights.
-
apollon.som.utilities.
sample_stm
(dims: Tuple[int, int, int], data: Optional[numpy.ndarray] = None, **kwargs) → numpy.ndarray¶ Compute initial SOM weights by sampling stochastic matrices from Dirichlet distribution.
The rows of each n by n stochastic matrix are sampes drawn from the Dirichlet distribution, where n is the number of rows and cols of the matrix. The diagonal elemets of the matrices are set to twice the probability of the remaining elements. The square root of the weight vectors’ size must be a real integer.
- Parameters
dims – Dimensions of SOM.
data – Input data set.
- Returns
Array of SOM weights.
Notes
Each row of the output array is to be considered a flattened stochastic matrix, such that each
N = sqrt(data.shape[1])
values are a discrete probability distribution forming theN
th row of the matrix.