dynsight.analysis.compute_entropy_gain_multi¶
- dynsight.analysis.compute_entropy_gain_multi(data, labels, n_bins, method='histo')[source]¶
Compute the relative information gained by the clustering.
Deprecated since version v2025.08.27: This function is deprecated and will be removed after June 2026. Use
analysis.info_gain()instead.- Parameters:
data (ndarray[Any, dtype[float64]]) – shape (n_samples, n_dimensions) The dataset over which the clustering is performed.
labels (ndarray[Any, dtype[int64]]) – shape (n_samples,) The clustering labels.
n_bins (list[int]) – The number of bins with which the data histogram must be computed, one for each dimension.
method (Literal['histo', 'kl']) – How the Shannon entropy is computed. You should use “histo” for discrete variables, and “kl” for continuous variables. If “kl” is chosen, the “n_bins” arg is irrelevant. See the documentation of
compute_shannon_multi()andcompute_kl_entropy_multi()for more details.
- Returns:
The absolute information gain \(H_0 - H_{clust}\)
The relative information gain \((H_0 - H_{clust}) / H_0\)
The Shannon entropy of the initial data \(H_0\)
The shannon entropy of the clustered data \(H_{clust}\)
- Return type:
Example
import numpy as np from dynsight.analysis import compute_entropy_gain_multi np.random.seed(1234) data = np.random.rand(1000, 2) # 2D dataset n_bins = [40, 40] labels = np.random.randint(-1, 2, size=1000) _, entropy_gain, *_ = compute_entropy_gain_multi( data, labels, n_bins=n_bins, )