Free cookie consent management tool by TermsFeed Condensed distance matrix - Calculating pairwise distances between data-points using SciPy | Pythontic.com

Condensed distance matrix - Calculating pairwise distances between data-points using SciPy

Overview:

  • For a given set of points the pdist() function computes and returns pairwise distances between the data points.

  • The pairwise distances are returned as a condensed distance matrix in a flat 1-dimensional ndarray containing the upper triangular portion of the distance matrix. The upper triangular portion refers to the region above the main diagonal of the matrix.

  • In a distance matrix, both lower triangular portion and upper triangular are equal and the main diagonal contains all zero values.

 

Parrot

Crow

Sparrow

Seagull

Snowy Owl

Duck

Parrot

0

0.05040794

0.20039966

1.30000105

1.50002324

0.50262991

Crow

0.05040794

0

0.15013015

1.35000836

1.55007018

0.55302356

Sparrow

0.20039966

0.15013015

0

1.50004033

1.7001297

0.70291963

Seagull

1.30000105

1.35000836

1.50004033

0

0.20024984

0.8017537

Snowy Owl

1.50002324

1.55007018

1.7001297

0.20024984

0

1.00092407

Duck

0.50262991

0.55302356

0.70291963

0.8017537

1.00092407

0

  • The default metric used for distance calculation is Euclidean.

  • For the given data-points, the condensed distance matrix can be directly passed on to the SciPy linkage() function to create hierarchical clusters in an agglomerative fashion.

  • The SciPy function pdist() supports several metrics including 'braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’ and ‘yule’ methods.

Example:

# Example Python program that returns pairwise distances 
# between two-dimnsional points as a condensed distance matrix
# (a one dimensional numpy array)
import scipy.spatial.distance as dist
import numpy as np

# Body weight vs brain weight of birds
data_points = np.array([[0.5,    0.01365],     # Parrot
                        [0.45,   0.00725],     # Crow
                        [0.3,    0.001],     # Sparrow
                        [1.8,    0.012],    # Sea gull
                        [2,      0.022],     # Snowy owl
                        [1,      0.065]        # Duck
                       ])

# Find the pairwise distances
ds = dist.pdist(data_points)
print("Condensed distance matrix:")
print(type(ds))
print("Distances:")
print(ds)
print("Length of the distance matrix:")
print(len(ds))

Output:

Condensed distance matrix:
<class 'numpy.ndarray'>
Distances:
[0.05040794 0.20039966 1.30000105 1.50002324 0.50262991 0.15013015
 1.35000836 1.55007018 0.55302356 1.50004033 1.7001297  0.70291963
 0.20024984 0.8017537  1.00092407]
Length of the distance matrix:
15

 


Copyright 2025 © pythontic.com