Calculating central tendency and dispersion values on the rolling windows of a pandas DataFrame

Overview:

  • The measures of the central tendency – mean, median and mode describe where most of the data is located and majorly what their values are. Similarly the measures of dispersion variance and standard deviation describe how the data is varying around its mean.
  • The methods mean(), median() invoked on a rolling object obtained from a Pandas DataFrame calculate the mean values and the median values for the windows. The methods var() and std() calculate the variance and the standard deviation for the data from the rolling windows.

Example:

  • Assuming price1 and price2 are the trade prices of a stock taken from two different execution venues, the resultant frames obtained from the rolling window calculations provide the mean price, median price, variance in the mean price and the standard deviation.

# Example Python program that 
# uses pandas rolling window functions
# to calculate central tendency measures and dispersion
# measures

import pandas as pd

# Create a DataFrame
data = [(pd.Timestamp(1656005844,unit='s'), 24.5, 23.1),
        (pd.Timestamp(1656005845,unit='s'), 24.1, 23.5),
        (pd.Timestamp(1656005846,unit='s'), 25.2, 24.3),
        (pd.Timestamp(1656005847,unit='s'), 25.3, 23.2)];

df = pd.DataFrame(data=data, columns=["time", "price1", "price2"]);
print("DataFrame:")
print(df)

# Get rolling windows of two rows
numRows = 3
r = df.rolling(numRows)   
print(r)

# Calculate mean on the rolling window of prices
r1 = r["price1","price2"].mean()
r1.columns = ["mean price1", "mean price2"]
r1.index = df["time"]
print(r1)

# Calculate the median
r2 = r["price1","price2"].median()
r2.columns = ["median price1", "median price2"]
r2.index = df["time"]
print(r2)

# Calculate the variance
r3 = r["price1","price2"].var()
r3.columns = ["price1 variance", "price2 variance"]
r3.index = df["time"]
print(r3)

# Calculate the standard deviation
r4 = r["price1","price2"].std()
r4.columns = ["price1 std", "price2 std"]
r4.index = df["time"]
print(r4)

Output:

DataFrame:

                 time  price1  price2

0 2022-06-23 17:37:24    24.5    23.1

1 2022-06-23 17:37:25    24.1    23.5

2 2022-06-23 17:37:26    25.2    24.3

3 2022-06-23 17:37:27    25.3    23.2

Rolling [window=3,center=False,axis=0,method=single]

                     mean price1  mean price2

time                                         

2022-06-23 17:37:24          NaN          NaN

2022-06-23 17:37:25          NaN          NaN

2022-06-23 17:37:26    24.600000    23.633333

2022-06-23 17:37:27    24.866667    23.666667

                     median price1  median price2

time                                             

2022-06-23 17:37:24            NaN            NaN

2022-06-23 17:37:25            NaN            NaN

2022-06-23 17:37:26           24.5           23.5

2022-06-23 17:37:27           25.2           23.5

                     price1 variance  price2 variance

time                                                 

2022-06-23 17:37:24              NaN              NaN

2022-06-23 17:37:25              NaN              NaN

2022-06-23 17:37:26         0.310000         0.373333

2022-06-23 17:37:27         0.443333         0.323333

                     price1 std  price2 std

time                                       

2022-06-23 17:37:24         NaN         NaN

2022-06-23 17:37:25         NaN         NaN

2022-06-23 17:37:26    0.556776    0.611010

2022-06-23 17:37:27    0.665833    0.568624

 


Copyright 2024 © pythontic.com