dynsight.analysis.compute_entropy_gain_multi

dynsight.analysis.compute_entropy_gain_multi(data, labels, n_bins, method='histo')[source]

Compute the relative information gained by the clustering.

Deprecated since version v2025.08.27: This function is deprecated and will be removed after June 2026. Use analysis.info_gain() instead.

Parameters:
  • data (ndarray[Any, dtype[float64]]) – shape (n_samples, n_dimensions) The dataset over which the clustering is performed.

  • labels (ndarray[Any, dtype[int64]]) – shape (n_samples,) The clustering labels.

  • n_bins (list[int]) – The number of bins with which the data histogram must be computed, one for each dimension.

  • method (Literal['histo', 'kl']) – How the Shannon entropy is computed. You should use “histo” for discrete variables, and “kl” for continuous variables. If “kl” is chosen, the “n_bins” arg is irrelevant. See the documentation of compute_shannon_multi() and compute_kl_entropy_multi() for more details.

Returns:

  • The absolute information gain \(H_0 - H_{clust}\)

  • The relative information gain \((H_0 - H_{clust}) / H_0\)

  • The Shannon entropy of the initial data \(H_0\)

  • The shannon entropy of the clustered data \(H_{clust}\)

Return type:

tuple[float, float, float, float]

Example

import numpy as np
from dynsight.analysis import compute_entropy_gain_multi

np.random.seed(1234)
data = np.random.rand(1000, 2)  # 2D dataset
n_bins = [40, 40]
labels = np.random.randint(-1, 2, size=1000)

_, entropy_gain, *_ = compute_entropy_gain_multi(
    data,
    labels,
    n_bins=n_bins,
)