Finding difference between rows and columns of a pandas DataFrame

Overview:

  • Difference between rows or columns of a pandas DataFrame object is found using the diff() method.
  • The axis parameter decides whether difference to be calculated is between rows or between columns.
  • When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.
  • When the periods parameter is negative difference is found by subtracting the next row from the previous row.
  • In the similar way, when axis=1, periods parameter decides which direction to move in the columnar fashion along with how many columns to skip.
  • At the DataFrame boundaries the difference calculation involves subtraction with non-existing previous/next rows or columns which produce a NaN as the result.
  • When the magnitude of the periods parameter is greater than 1, (n-1) number of rows or columns are skipped to take the next row.

 

Example: Finding difference between rows of a pandas DataFrame

import pandas as pd

 

dataset = [(2, 4, 6, 8),

           (10, 12, 14, 18),

           (20, 22, 24, 26),

           (28, 30, 32, 34)];

 

dataFrame = pd.DataFrame(dataset, columns = ("A", "B", "C", "D" ));

print("Dataset:")

print(dataFrame);

 

# Calculate the difference between rows - By default, periods = 1

difference = dataFrame.diff(axis=0);

print("Difference between rows(Period=1):");

print(difference);

 

# Calculate the difference between rows - periods = 2

difference = dataFrame.diff(periods=2)

print("Difference between rows(Period=2):");

print(difference);

 

# Calculate the difference between rows - periods = -1

difference = dataFrame.diff(periods=-1)

print("Difference between rows(Period=-1):");

print(difference);

 

# Calculate the difference between rows - periods = -2

difference = dataFrame.diff(periods=-2)

print("Difference between rows(Period=-2):");

print(difference);

 

Output:

Dataset:

    A   B   C   D

0   2   4   6   8

1  10  12  14  18

2  20  22  24  26

3  28  30  32  34

Difference between rows(Period=1):

      A     B     C     D

0   NaN   NaN   NaN   NaN

1   8.0   8.0   8.0  10.0

2  10.0  10.0  10.0   8.0

3   8.0   8.0   8.0   8.0

Difference between rows(Period=2):

      A     B     C     D

0   NaN   NaN   NaN   NaN

1   NaN   NaN   NaN   NaN

2  18.0  18.0  18.0  18.0

3  18.0  18.0  18.0  16.0

Difference between rows(Period=-1):

      A     B     C     D

0  -8.0  -8.0  -8.0 -10.0

1 -10.0 -10.0 -10.0  -8.0

2  -8.0  -8.0  -8.0  -8.0

3   NaN   NaN   NaN   NaN

Difference between rows(Period=-2):

      A     B     C     D

0 -18.0 -18.0 -18.0 -18.0

1 -18.0 -18.0 -18.0 -16.0

2   NaN   NaN   NaN   NaN

3   NaN   NaN   NaN   NaN

 

 

 

 

Example: Finding difference between columns of a pandas DataFrame

import pandas as pd

 

dataValues = [(3, 15, 30, 48),

              (6, 20, 33, 45),

              (9, 25, 36, 42),

              (12, 30, 40, 39)];

 

dataFrameObject = pd.DataFrame(dataValues, columns = ("W", "X", "Y", "Z" ));

print("Dataset:")

print(dataFrameObject);

 

# Calculate the difference between columns - By default, periods = 1

differenceFrame = dataFrameObject.diff(axis=1);

print("Difference between columns(Period=1):");

print(differenceFrame);

 

# Calculate the difference between columns - periods = 2

differenceFrame = dataFrameObject.diff(axis=1, periods=2)

print("Difference between columns(Period=2):");

print(differenceFrame);

 

# Calculate the difference between columns - periods = -1

differenceFrame = dataFrameObject.diff(axis=1, periods=-1)

print("Difference between columns(Period=-1):");

print(differenceFrame);

 

# Calculate the difference between columns - periods = -2

differenceFrame = dataFrameObject.diff(axis=1, periods=-2)

print("Difference between columns(Period=-2):");

print(differenceFrame);

 

 

Output:

Dataset:

    W   X   Y   Z

0   3  15  30  48

1   6  20  33  45

2   9  25  36  42

3  12  30  40  39

Difference between columns(Period=1):

    W     X     Y     Z

0 NaN  12.0  15.0  18.0

1 NaN  14.0  13.0  12.0

2 NaN  16.0  11.0   6.0

3 NaN  18.0  10.0  -1.0

Difference between columns(Period=2):

    W   X     Y     Z

0 NaN NaN  27.0  33.0

1 NaN NaN  27.0  25.0

2 NaN NaN  27.0  17.0

3 NaN NaN  28.0   9.0

Difference between columns(Period=-1):

      W     X     Y   Z

0 -12.0 -15.0 -18.0 NaN

1 -14.0 -13.0 -12.0 NaN

2 -16.0 -11.0  -6.0 NaN

3 -18.0 -10.0   1.0 NaN

Difference between columns(Period=-2):

      W     X   Y   Z

0 -27.0 -33.0 NaN NaN

1 -27.0 -25.0 NaN NaN

2 -27.0 -17.0 NaN NaN

3 -28.0  -9.0 NaN NaN


Copyright 2024 © pythontic.com