dynsight.analysis.compute_entropy_gain_multi¶

dynsight.analysis.compute_entropy_gain_multi(data, labels, n_bins, method='histo')[source]¶

Compute the relative information gained by the clustering.

Deprecated since version v2025.08.27: This function is deprecated and will be removed after June 2026. Use analysis.info_gain() instead.

Parameters:

data (ndarray[Any, dtype[float64]]) – shape (n_samples, n_dimensions) The dataset over which the clustering is performed.
labels (ndarray[Any, dtype[int64]]) – shape (n_samples,) The clustering labels.
n_bins (list[int]) – The number of bins with which the data histogram must be computed, one for each dimension.
method (Literal['histo', 'kl']) – How the Shannon entropy is computed. You should use “histo” for discrete variables, and “kl” for continuous variables. If “kl” is chosen, the “n_bins” arg is irrelevant. See the documentation of compute_shannon_multi() and compute_kl_entropy_multi() for more details.

Returns:

The absolute information gain \(H_0 - H_{clust}\)
The relative information gain \((H_0 - H_{clust}) / H_0\)
The Shannon entropy of the initial data \(H_0\)
The shannon entropy of the clustered data \(H_{clust}\)

Return type:

tuple[float, float, float, float]

Example

import numpy as np
from dynsight.analysis import compute_entropy_gain_multi

np.random.seed(1234)
data = np.random.rand(1000, 2)  # 2D dataset
n_bins = [40, 40]
labels = np.random.randint(-1, 2, size=1000)

_, entropy_gain, *_ = compute_entropy_gain_multi(
    data,
    labels,
    n_bins=n_bins,
)