dynsight.analysis.compute_entropy_gain

dynsight.analysis.compute_entropy_gain(data, labels, method='histo', n_bins=20)[source]

Compute the relative information gained by the clustering.

Deprecated since version v2025.08.27: This function is deprecated and will be removed after June 2026. Use analysis.info_gain() instead.

Parameters:
  • data (ndarray[Any, dtype[float64]]) – The dataset over which the clustering is performed.

  • labels (ndarray[Any, dtype[int64]]) – The clustering labels. Has the same shape as “data”.

  • n_bins (int) – The number of bins with which the data histogram must be computed. Default is 20.

  • method (Literal['histo', 'kl']) – How the Shannon entropy is computed. You should use “histo” for discrete variables, and “kl” for continuous variables. If “kl” is chosen, the “n_bins” arg is irrelevant. See the documentation of compute_shannon() and compute_kl_entropy() for more details.

Returns:

  • The absolute information gain \(H_0 - H_{clust}\)

  • The relative information gain \((H_0 - H_{clust}) / H_0\)

  • The Shannon entropy of the initial data \(H_0\)

  • The shannon entropy of the clustered data \(H_{clust}\)

Return type:

tuple[float, float, float, float]

Note

The output are expressed as fractions if method is “histo”, in bit if method is “kl”.

Example

import numpy as np
from dynsight.analysis import compute_entropy_gain

np.random.seed(1234)
data = np.random.rand(100, 100)
labels = np.random.randint(-1, 2, size=(100, 100))

_, entropy_gain, *_ = compute_entropy_gain(
    data,
    labels,
    n_bins=40,
)