# Finding Correlation Coefficient Between Columns Of A Pandas Dataframe

## Overview:

• Correlation coefficients evaluate how two variables are related to each other. The relationship could be linear, linear but in opposite direction (i.e., inversely related), or monotonic. In a monotonic relationship the variables may not change together at the same rate.
• pandas’ DataFrame class has the method corr() that computes three different correlation coefficients between two variables using any of the following methods : Pearson correlation method, Kendall Tau correlation method and Spearman correlation method. The correlation coefficients calculated using these methods vary from +1 to -1.
• While the corr() function finds the correlation coefficients between the columns of a DataFrame instance, the corrwith() function computes correlation coefficients between rows or columns of two different dataframe instances. The one dimensional collection pandas.series supports finding correlation between variables represented by two pandas.series objects.

## Pearson correlation coefficient:

• Pearson correlation coefficient is defined as the covariance of two variables divided by the product of their standard deviations. It evaluates the linear relationship between two variables. Pearson correlation coefficient has a value between +1 and -1.
• The value 1 indicates that there is a linear correlation between variable x and y. The value 0 indicates that the variables x and y are not related. The value -1 indicates that there is an inverse correlation between variable x and y.
• Pearson correlation coefficient is also called as Pearson product-moment correlation coefficient.

## Kendall Tau correlation coefficient:

• It quantifies the discrepancy between the number of concordant and discordant pairs of two variables.

## Spearman correlation coefficient:

• Spearman correlation method is a nonparametric evaluation that finds the strength and direction of the monotonic relationship between two variables.
• This method is used when the data is not normally distributed or when the sample size is small (less than 30).

## Example - Finding correlation coefficient between rows of a same DataFrame instance:

 import pandas as pd import numpy as np import scipy as sp   values = {"X":[20, 25, 30, 35, 40, 45],           "Y":[10, 9, 9, 8, 8, 7]};   dataFrame       = pd.DataFrame(data=values); print("DataFrame:"); print(dataFrame);   corrrelation    = dataFrame.corr(method="pearson"); print("Pearson correlation coefficient:"); print(corrrelation);   corrrelation    = dataFrame.corr(method="kendall"); print("Kendall Tau correlation coefficient:"); print(corrrelation);   corrrelation    = dataFrame.corr(method="spearman"); print("Spearman rank correlation:"); print(corrrelation);

## Output:

 DataFrame:     X   Y 0  20  10 1  25   9 2  30   9 3  35   8 4  40   8 5  45   7 Pearson correlation coefficient:          X        Y X  1.00000 -0.96833 Y -0.96833  1.00000 Kendall Tau correlation coefficient:           X         Y X  1.000000 -0.930949 Y -0.930949  1.000000 Spearman rank correlation:           X         Y X  1.000000 -0.971008 Y -0.971008  1.000000

## Example - Finding correlation coefficient between rows of a different DataFrame instances:

 import pandas as pd import scipy as sp dataValues1 = [(8, 9, 10, 11, 12, 13, 14, 15, 16),                (8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5)]; dataValues2 = [(2, 1.5, 1, 1.5, 3, 3, 2, 2.5, 3),                (2.1, 1.5, 1.2, 1.4, 3.2, 3.1, 2.2, 2.53, 3.2)];                 dataFrame1   = pd.DataFrame(data=dataValues1); dataFrame2   = pd.DataFrame(data=dataValues2); print("DataFrame1:"); print(dataFrame1) print("DataFrame2:"); print(dataFrame2) # Find Pearson correlation coefficient between rows of different data drames pearsonCorrelation  = dataFrame1.corrwith(dataFrame2, axis=1); print("Pearson correlation coefficient between rows of dataFrame1 and dataFrame2: "); print(pearsonCorrelation); # Find Kendall Tau correlation coefficient between rows of different data drames kendallCorrelation  = dataFrame1.corrwith(dataFrame2, axis=1, method="kendall"); print("Kendall Tau correlation coefficient between rows of dataFrame1 and dataFrame2: "); print(kendallCorrelation); # Find Spearman rank correlation between rows of different data drames spearmanCorrelation  = dataFrame1.corrwith(dataFrame2, axis=1, method="spearman"); print("Spearman rank correlation between rows of dataFrame1 and dataFrame2: "); print(spearmanCorrelation);

## Output:

 DataFrame1:      0    1     2     3     4     5     6     7     8 0  8.0  9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0 1  8.5  9.5  10.5  11.5  12.5  13.5  14.5  15.5  16.5 DataFrame2:      0    1    2    3    4    5    6     7    8 0  2.0  1.5  1.0  1.5  3.0  3.0  2.0  2.50  3.0 1  2.1  1.5  1.2  1.4  3.2  3.1  2.2  2.53  3.2 Pearson correlation coefficient between rows of dataFrame1 and dataFrame2: 0    0.639010 1    0.645101 dtype: float64 Kendall Tau correlation coefficient between rows of dataFrame1 and dataFrame2: 0    0.449013 1    0.422577 dtype: float64 Spearman rank correlation between rows of dataFrame1 and dataFrame2: 0    0.632687 1    0.669462 dtype: float64