## Overview:

- Majority of the Data Analysis done using the Python library
**pandas**, involve the data structures Series and DataFrame. While pandas.Series being a 1–dimensional mutable, heterogeneous array and the pandas.DataFrame being a 2–dimensional mutable, heterogeneous array - both Series and DataFrame are implemented using the numpy's ndarray as the underlying Data Structure. - The classes pandas.Series and pandas.DataFrame provide methods for holding, re-shaping the data and performing statistical and mathematical operations on the data.
- The method
**series.corr()**finds the correlation between two variables represented by two pandas.Series instances.The DataFrame.corr() method finds correlation coefficient between two pandas.DataFrame columns. **Correlation is a statistical measure**that finds how far two variables are related if at all there exists a relationship between them. Examples include, Per capita income and life expectancy, Forest coverage and annual rainfall of a region. Correlation is measured by the**Correlation Coefficient (r)**.- The
**value of the correlation coefficient**is always in the**range of -1 to +1**. - When the correlation coefficient is +1, the two variables are correlated in the
**positive direction**. Which means, if a variable increases in value by +1 the other variable also increases by +1. If a variable increases by +1 and the other variable increases by +0.5 then they are still correlated in the positive direction. When the correlation coefficient is -1, the two variables are negatively correlated. This means if a variable increases by one unit in positive direction the other variable increases by one unit in the**negative direction**. - There are several methods to measure the correlation coefficient. The pandas method
**series.corr()**supports calculating correlation coefficient using the methods:**Pearson**,**Kendall**and**Spearman**. It also supports any other custom method through the parameter callable. The custom function calculating the correlation coefficient should take two one-dimensional ndarray objects as parameters and should return a float.

Example:

# Python example to find the Correlation coefficient # Prices of house # The years # House prices loaded into a pandas series # Years loaded into a pandas series # Find the correlation coefficient between house price and year corr_value = housePrices.corr(years, method="kendall"); corr_value = housePrices.corr(years, method="spearman"); |

## Output:

Correlation coefficient between house price and year (Method:Pearson) 0.75 Correlation coefficient between house price and year (Method:Kendall rank correlation coefficient) 0.6 Correlation coefficient between house price and year (Method:Spearman rank correlation coefficient) 0.71 |