# Compute Correlation Coefficient For The Variables Represented By Two Pandas.series Objects

## Overview:

• Majority of the Data Analysis done using the Python library pandas, involve the data structures Series and DataFrame. While pandas.Series being a 1–dimensional mutable, heterogeneous array and the pandas.DataFrame being a 2–dimensional mutable, heterogeneous array - both Series and DataFrame are implemented using the numpy's ndarray as the underlying Data Structure.
• The classes pandas.Series and pandas.DataFrame provide methods for holding, re-shaping the data and performing statistical and mathematical operations on the data.
• The method series.corr() finds the correlation between two variables represented by two pandas.Series instances.The DataFrame.corr() method finds correlation coefficient between two pandas.DataFrame columns.
• Correlation is a statistical measure that finds how far two variables are related if at all there exists a relationship between them. Examples include, Per capita income and life expectancy, Forest coverage and annual rainfall of a region. Correlation is measured by the Correlation Coefficient (r).
• The value of the correlation coefficient is always in the range of -1 to +1.
• When the correlation coefficient is +1, the two variables are correlated in the positive direction. Which means, if a variable increases in value by +1 the other variable also increases by +1. If a variable increases by +1 and the other variable increases by +0.5 then they are still correlated in the positive direction. When the correlation coefficient is -1, the two variables are negatively correlated. This means if a variable increases by one unit in positive direction the other variable increases by one unit in the negative direction.
• There are several methods to measure the correlation coefficient. The pandas method series.corr() supports calculating correlation coefficient using the methods: Pearson, Kendall and Spearman. It also supports any other custom method through the parameter callable. The custom function calculating the correlation coefficient should take two one-dimensional ndarray objects as parameters and should return a float.

Example:

 # Python example to find the Correlation coefficient # of two variables represented by two pandas Series instances import pandas as pd # Prices of house  housePriceList = [250, 265, 270, 262, 268, 272]; # The years yearList       = [2014, 2015, 2016, 2017, 2018, 2019]; # House prices loaded into a pandas series housePrices     = pd.Series(housePriceList); # Years loaded into a pandas series years           = pd.Series(yearList); # Find the correlation coefficient between house price and year corr_value = housePrices.corr(years, method="pearson"); print("Correlation coefficient between house price and year (Method:Pearson)"); print(round(corr_value,2)); corr_value = housePrices.corr(years, method="kendall"); print("Correlation coefficient between house price and year (Method:Kendall rank correlation coefficient)"); print(round(corr_value,2)); corr_value = housePrices.corr(years, method="spearman"); print("Correlation coefficient between house price and year (Method:Spearman rank correlation coefficient)"); print(round(corr_value,2));

## Output:

 Correlation coefficient between house price and year (Method:Pearson) 0.75 Correlation coefficient between house price and year (Method:Kendall rank correlation coefficient) 0.6 Correlation coefficient between house price and year (Method:Spearman rank correlation coefficient) 0.71