Cleaning Cluster Population

Sometimes, clusters obtained with Onion Clustering analysis can be very small. To better interpret the results, it can be useful to remove those ones by assigning them to the cluster of the unclassified particles. This is achieved through the class, data_processing.cleaning_cluster_population(), which assign the cluster under a certain population threshold to a specific cluster selected by the user.

At the end of every section, you will find links to download the full python scripts and its relevant input files.

As an example, we consider the ouput of the analysis computed in the spatial denoising tutorial. Briefly, we consider the denoised TimeSOAP descriptor that can be obtained from:

import numpy as np
from pathlib import Path
import dynsight
from dynsight.trajectory import Trj
from dynsight.data_processing import cleaning_cluster_population

files_path = Path("source/_static/simulations")
trj = Trj.init_from_xtc(
    traj_file=files_path / "ice_water_ox.xtc",
    topo_file=files_path / "ice_water_ox.gro",
)

_, tsoap = trj.get_timesoap(
    r_cut=10,
    n_max=8,
    l_max=8,
    n_jobs=4, # Adjust n_jobs according to your computer capabilities
)

sliced_trj = trj.with_slice(slice(0, -1, 1))
sp_denoised_tsoap = tsoap.spatial_average(
    trj=sliced_trj,
    r_cut=10,
    n_jobs=4, # Adjust n_jobs according to your computer capabilities
)

delta_t_list, n_clust, unclass_frac, labels = sp_denoised_tsoap.get_onion_analysis(
    delta_t_min=2,
    delta_t_num=20,
    fig1_path=files_path / "denoised_onion_analysis.png",
    fig2_path=files_path / "cluster_population.png",
)

For further details users should refer to spatial denoising tutorial.

Figure cluster_population.png shows the population of every cluster, each color is a different cluster and blue refers to the unclassified fraction:

../_images/cluster_population.png

Before cleaning the cluster we have to save the output from the Onion analysis in an array:

onion_output = np.array([delta_t_list, n_clust, unclass_frac]).T

The small clusters can be removed and assigned to the unclassified fraction using the class data_processing.cleaning_cluster_population():

cleaned_labels = cleaning_cluster_population(labels, threshold=0.05, assigned_env=-1)

where cleaned_labels has the same dimensions as labels. Now we can reproduce the plot with the number of clusters and the unclassified fraction after re-organizing the data. In particular, onion.plot_smooth.plot_time_res_analysis(), which gives the plot that we want to obtain, requires and array with the list of the time windows, the number of clusters at every ∆t, and the unclassified fraction. Therefore, before plotting the graph, we need to create it by copying the list of time windows from the one given by the Onion analysis, and calculate the number of clusters and the unclassified fraction from the cleaned labels:

delta_t_list = onion_output[:, 0]  # Since unchanged, windows can be copied from above.

n_clust = np.zeros(delta_t_list.shape[0],dtype=np.int64)
unclass_frac = np.zeros(delta_t_list.shape[0])
for i in range(delta_t_list.shape[0]):
    n_clust[i] = np.unique(cleaned_labels[:, :, i]).size - 1
    unclass_frac[i] = np.sum(cleaned_labels[:, :, i] == -1) / np.size(cleaned_labels[:, :, i])

cleaned_onion_output = np.array([delta_t_list, n_clust, unclass_frac]).T

dynsight.onion.plot_smooth.plot_time_res_analysis("cleaned_onion_analysis.png", cleaned_onion_output)

On the left are reported the results from Onion clustering on the denoised time-series (denoised_onion_analysis.png from spatial denoising tutorial), while on the rigth is reported the figure cleaned_onion_analysis.png.

../_images/denoised_onion_analysis.png ../_images/cleaned_onion_analysis.png

Full scripts and input files

⬇️ Download the .gro file
⬇️ Download the .xtc file
⬇️ Download Python Script