Drawing a clustered heatmap using seaborn

Overview:

  • The function clustermap() in seaborn draws a hierarchically clustered heatmap.
  • A clustered heatmap is different from an ordinary heatmap on the following terms:
    • The heatmap cells are all clustered using a similarity algorithm.
    • Dentograms are drawn for the columns and the rows of the heatmap.
  • Clustered heatmaps are superior to ordinary heatmaps as the heatmap cells are arranged in clusters.These clusters are further complemented by the dendograms which provide a summary view of the clusters including the inter cluster distances.
  • The seaborn visulaization library uses the the scipy function linkage() for transforming the raw data into a hierarchy of clusters.
  • The scipy function linkage(), supports a variety of methods to decide which of the given two points should get into the cluster while iteratively computing the distance matrices. This can be any of the following: Nearest point Algorithm, Farthest Point Algorithm, WPGMA, Centroid, Median and Ward variance minimization, which can be selected through the "method" parameter of the clustermap() function.
  • Just like adding a point to a cluster can be decided on maximum distance, minimum distance and others, the distance between two points itself can have different meanings. For seaborn and scipy the distance could be any of the following: Euclidean, minkowski, cityblock, seuclidean, sqeuclidean, cosine, correlation, hamming, jaccard, chebyshev, canberra, braycurtis , mahalanobis , yule, Dice, kulsinski, rogerstanimoto, russellrao, sokalmichener, sokalsneath and wminkowski distance. The distance semantics is controlled through the metric parameter of the seaborn function clustermap().

Example:

#Example Python program that creates a clustered heatmap using the Python #visualization library Seaborn
import matplotlib.pyplot as plt
import seaborn as sbn
import pandas as pds

# GDP data for six states for 12 months
s1 = [100, 94, 56, 76, 81, 91, 51, 55, 72, 66, 60, 58 ];
s2 = [82, 81, 94, 96, 93, 84, 80, 82, 84, 86, 81, 78];
s3 = [65, 61, 66, 62, 67, 71, 69, 73, 68, 64, 66, 70];
s4 = [150, 140, 145, 151, 156, 152, 160, 165, 159, 149, 155, 162];
s5 = [75, 74, 76, 78, 80, 82, 85, 81, 77, 73, 75, 67];
s6 = [80, 75, 70, 72, 67, 65, 62, 63, 65, 60, 66, 69];

# Months
months= ["Jan", "Feb", "Mar", "Apr",
         "May", "Jun", "Jul", "Aug",
         "Sep", "Oct", "Nov", "Dec"];

# Python dictionary of states vs their GDPs
d1 = {"State1":s1,
      "State2":s2,
      "State3":s3,
      "State4":s4,
      "State5":s5,
      "State6":s6};

# Create a pandas dataframe
df = pds.DataFrame(data=d1, index=months);
print(df);
print(df.columns);
print(df.index);

# Create a clustermap using seaborn
sbn.clustermap(df);
plt.show();   

Output:

Clustered heatmap


Copyright 2024 © pythontic.com