dynsight.analysis.shannon¶
- dynsight.analysis.shannon(data, method, base=2.0, n_neigh=4)[source]¶
Compute the Shannon entropy of a data distribution.
- Parameters:
data (NDArray[np.float64]) – The data for which the entropy is to be computed. Has shape (n_samples, n_features).
method (Literal['histo', 'kl']) – How the Shannon entropy is computed. You should use “histo” for discrete variables, and “kl” for continuous variables. If “histo” is chosen, the “n_neigh” arg is irrelevant. See the documentation of the infomeasure package for more details (link in the notes below).
base (float) – The units of measure of the returned value. Use “2” for bits, “np.e” for nats.
n_neigh (int) – The number of neighbors considered in the KL estimator. The default value n_neigh = 4 is recommended in the literature.
- Returns:
The value of the Shannon entropy of the data.
- Return type:
Notes
This function uses the
infomeasure.entropy()function, see https://infomeasure.readthedocs.io/en/latest/guide/entropy/.Example
import numpy as np from dynsight.analysis import shannon rng = np.random.default_rng(seed=42) ### Discrete case: fair coin. H = 1 bit. ### int_data = rng.integers(low=0, high=2, size=100000) h_int = shannon(data=int_data, method="histo") ### Bivariate case: 2 fair coins. H = 2 bit. ### int_data = rng.integers(low=0, high=2, size=(100000, 2)) h_int_2 = shannon(data=int_data, method="histo") ### Continuous case: uniform distribution in [0, 10]. ### float_data = rng.random(200000) * 10 h_float = shannon(data=float_data, method="kl")