Overview:
- Difference between rows or columns of a pandas DataFrame object is found using the diff() method.
- The axis parameter decides whether difference to be calculated is between rows or between columns.
- When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.
- When the periods parameter is negative difference is found by subtracting the next row from the previous row.
- In the similar way, when axis=1, periods parameter decides which direction to move in the columnar fashion along with how many columns to skip.
- At the DataFrame boundaries the difference calculation involves subtraction with non-existing previous/next rows or columns which produce a NaN as the result.
- When the magnitude of the periods parameter is greater than 1, (n-1) number of rows or columns are skipped to take the next row.
Example: Finding difference between rows of a pandas DataFrame
import pandas as pd
dataset = [(2, 4, 6, 8), (10, 12, 14, 18), (20, 22, 24, 26), (28, 30, 32, 34)];
dataFrame = pd.DataFrame(dataset, columns = ("A", "B", "C", "D" )); print("Dataset:") print(dataFrame);
# Calculate the difference between rows - By default, periods = 1 difference = dataFrame.diff(axis=0); print("Difference between rows(Period=1):"); print(difference);
# Calculate the difference between rows - periods = 2 difference = dataFrame.diff(periods=2) print("Difference between rows(Period=2):"); print(difference);
# Calculate the difference between rows - periods = -1 difference = dataFrame.diff(periods=-1) print("Difference between rows(Period=-1):"); print(difference);
# Calculate the difference between rows - periods = -2 difference = dataFrame.diff(periods=-2) print("Difference between rows(Period=-2):"); print(difference); |
Output:
Dataset: A B C D 0 2 4 6 8 1 10 12 14 18 2 20 22 24 26 3 28 30 32 34 Difference between rows(Period=1): A B C D 0 NaN NaN NaN NaN 1 8.0 8.0 8.0 10.0 2 10.0 10.0 10.0 8.0 3 8.0 8.0 8.0 8.0 Difference between rows(Period=2): A B C D 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 18.0 18.0 18.0 18.0 3 18.0 18.0 18.0 16.0 Difference between rows(Period=-1): A B C D 0 -8.0 -8.0 -8.0 -10.0 1 -10.0 -10.0 -10.0 -8.0 2 -8.0 -8.0 -8.0 -8.0 3 NaN NaN NaN NaN Difference between rows(Period=-2): A B C D 0 -18.0 -18.0 -18.0 -18.0 1 -18.0 -18.0 -18.0 -16.0 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN
|
Example: Finding difference between columns of a pandas DataFrame
import pandas as pd
dataValues = [(3, 15, 30, 48), (6, 20, 33, 45), (9, 25, 36, 42), (12, 30, 40, 39)];
dataFrameObject = pd.DataFrame(dataValues, columns = ("W", "X", "Y", "Z" )); print("Dataset:") print(dataFrameObject);
# Calculate the difference between columns - By default, periods = 1 differenceFrame = dataFrameObject.diff(axis=1); print("Difference between columns(Period=1):"); print(differenceFrame);
# Calculate the difference between columns - periods = 2 differenceFrame = dataFrameObject.diff(axis=1, periods=2) print("Difference between columns(Period=2):"); print(differenceFrame);
# Calculate the difference between columns - periods = -1 differenceFrame = dataFrameObject.diff(axis=1, periods=-1) print("Difference between columns(Period=-1):"); print(differenceFrame);
# Calculate the difference between columns - periods = -2 differenceFrame = dataFrameObject.diff(axis=1, periods=-2) print("Difference between columns(Period=-2):"); print(differenceFrame); |
Output:
Dataset: W X Y Z 0 3 15 30 48 1 6 20 33 45 2 9 25 36 42 3 12 30 40 39 Difference between columns(Period=1): W X Y Z 0 NaN 12.0 15.0 18.0 1 NaN 14.0 13.0 12.0 2 NaN 16.0 11.0 6.0 3 NaN 18.0 10.0 -1.0 Difference between columns(Period=2): W X Y Z 0 NaN NaN 27.0 33.0 1 NaN NaN 27.0 25.0 2 NaN NaN 27.0 17.0 3 NaN NaN 28.0 9.0 Difference between columns(Period=-1): W X Y Z 0 -12.0 -15.0 -18.0 NaN 1 -14.0 -13.0 -12.0 NaN 2 -16.0 -11.0 -6.0 NaN 3 -18.0 -10.0 1.0 NaN Difference between columns(Period=-2): W X Y Z 0 -27.0 -33.0 NaN NaN 1 -27.0 -25.0 NaN NaN 2 -27.0 -17.0 NaN NaN 3 -28.0 -9.0 NaN NaN |