Overview:
-
For a given set of points the pdist() function computes and returns pairwise distances between the data points.
-
The pairwise distances are returned as a condensed distance matrix in a flat 1-dimensional ndarray containing the upper triangular portion of the distance matrix. The upper triangular portion refers to the region above the main diagonal of the matrix.
-
In a distance matrix, both lower triangular portion and upper triangular are equal and the main diagonal contains all zero values.
|
Parrot |
Crow |
Sparrow |
Seagull |
Snowy Owl |
Duck |
Parrot |
0 |
0.05040794 |
0.20039966 |
1.30000105 |
1.50002324 |
0.50262991 |
Crow |
0.05040794 |
0 |
0.15013015 |
1.35000836 |
1.55007018 |
0.55302356 |
Sparrow |
0.20039966 |
0.15013015 |
0 |
1.50004033 |
1.7001297 |
0.70291963 |
Seagull |
1.30000105 |
1.35000836 |
1.50004033 |
0 |
0.20024984 |
0.8017537 |
Snowy Owl |
1.50002324 |
1.55007018 |
1.7001297 |
0.20024984 |
0 |
1.00092407 |
Duck |
0.50262991 |
0.55302356 |
0.70291963 |
0.8017537 |
1.00092407 |
0 |
-
The default metric used for distance calculation is Euclidean.
-
For the given data-points, the condensed distance matrix can be directly passed on to the SciPy linkage() function to create hierarchical clusters in an agglomerative fashion.
-
The SciPy function pdist() supports several metrics including 'braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’ and ‘yule’ methods.
Example:
# Example Python program that returns pairwise distances # Body weight vs brain weight of birds # Find the pairwise distances
|
Output:
Condensed distance matrix: |