dynsight.analysis.compute_entropy_gain¶
- dynsight.analysis.compute_entropy_gain(data, labels, method='histo', n_bins=20)[source]¶
Compute the relative information gained by the clustering.
Deprecated since version v2025.08.27: This function is deprecated and will be removed after June 2026. Use
analysis.info_gain()instead.- Parameters:
data (ndarray[Any, dtype[float64]]) – The dataset over which the clustering is performed.
labels (ndarray[Any, dtype[int64]]) – The clustering labels. Has the same shape as “data”.
n_bins (int) – The number of bins with which the data histogram must be computed. Default is 20.
method (Literal['histo', 'kl']) – How the Shannon entropy is computed. You should use “histo” for discrete variables, and “kl” for continuous variables. If “kl” is chosen, the “n_bins” arg is irrelevant. See the documentation of
compute_shannon()andcompute_kl_entropy()for more details.
- Returns:
The absolute information gain \(H_0 - H_{clust}\)
The relative information gain \((H_0 - H_{clust}) / H_0\)
The Shannon entropy of the initial data \(H_0\)
The shannon entropy of the clustered data \(H_{clust}\)
- Return type:
Note
The output are expressed as fractions if method is “histo”, in bit if method is “kl”.
Example
import numpy as np from dynsight.analysis import compute_entropy_gain np.random.seed(1234) data = np.random.rand(100, 100) labels = np.random.randint(-1, 2, size=(100, 100)) _, entropy_gain, *_ = compute_entropy_gain( data, labels, n_bins=40, )