dynsight.analysis.shannon

dynsight.analysis.shannon(data, method, base=2.0, n_neigh=4)[source]

Compute the Shannon entropy of a data distribution.

Parameters:
  • data (NDArray[np.float64]) – The data for which the entropy is to be computed. Has shape (n_samples, n_features).

  • method (Literal['histo', 'kl']) – How the Shannon entropy is computed. You should use “histo” for discrete variables, and “kl” for continuous variables. If “histo” is chosen, the “n_neigh” arg is irrelevant. See the documentation of the infomeasure package for more details (link in the notes below).

  • base (float) – The units of measure of the returned value. Use “2” for bits, “np.e” for nats.

  • n_neigh (int) – The number of neighbors considered in the KL estimator. The default value n_neigh = 4 is recommended in the literature.

Returns:

The value of the Shannon entropy of the data.

Return type:

float

Notes

This function uses the infomeasure.entropy() function, see https://infomeasure.readthedocs.io/en/latest/guide/entropy/.

Example

import numpy as np
from dynsight.analysis import shannon
rng = np.random.default_rng(seed=42)

### Discrete case: fair coin. H = 1 bit. ###
int_data = rng.integers(low=0, high=2, size=100000)
h_int = shannon(data=int_data, method="histo")

### Bivariate case: 2 fair coins. H = 2 bit. ###
int_data = rng.integers(low=0, high=2, size=(100000, 2))
h_int_2 = shannon(data=int_data, method="histo")

### Continuous case: uniform distribution in [0, 10]. ###
float_data = rng.random(200000) * 10
h_float = shannon(data=float_data, method="kl")