Replacing pandas DataFrame elements

Overview:

  • The DataFrame class of pandas library provides several means to replace one or more elements of a DataFrame. They include loc, iloc properties of the DataFrame and the methods mask() and replace().

  • Together all these methods facilitate replacement of one or more elements based on labels, indexes, boolean expressions, regular expressions and through explicit specification of values.

  • Using the mask() method, the elements of a pandas DataFrame can be replaced with the value from an another DataFrame using a Boolean condition or a function returning the replacement value.

  • By default, the mask() method uses a default DataFrame whose elements are all NaN as the source of replacement values. For example, any entry present in a DataFrame that is greater than the integer value 10 can be replaced with some other value as defined by another DataFrame or a function.

Example 1 - Replacing the elements with the values from the another(the default) DataFrame :

# Example Python program that replaces a DataFrame elements using a

# Boolean condition. The replacement values come from the default

# DataFrame used by the mask() method which has all NaN values.

import pandas as pds

# Python dictionary of hourly readings
hourlyReadings = {"10.00":[25.5, 26.5, 29.0, 26.0],
                   "11.00":[27.0, 27.1, 27.2, 30.1],
                   "12.00":[27.1, 27.4, 29.5, 28.1]
                     };
# Make a pandas DataFrame
df = pds.DataFrame(data=hourlyReadings);
print("Original DataFrame:");
print(df);

# Replace values matching a threshold
replaced=df.mask(df > 29.0);
print("DataFrame after replacing the matching entries with NaN:");
print(replaced);

Output:

Original DataFrame:
   10.00  11.00  12.00
0   25.5   27.0   27.1
1   26.5   27.1   27.4
2   29.0   27.2   29.5
3   26.0   30.1   28.1
DataFrame after replacing the matching entries with NaN:
   10.00  11.00  12.00
0   25.5   27.0   27.1
1   26.5   27.1   27.4
2   29.0   27.2    NaN
3   26.0    NaN   28.1

Example 2 - Replacing the elements with the values from an another DataFrame :

# Example Python program that replaces element values of a DataFrame
# using another DataFrame based on a Boolean condition
import pandas as pds

# Create the DataFrame whose values
# will be replaced
frame1 = {"C1":[1.0, 1.0, 1.1, 1.0],
           "C2":[1.0, 1.1, 1.1, 1.0],
           "C3":[1.1, 1.0, 1.0, 1.0]
         };
df = pds.DataFrame(data=frame1);

print("DataFrame:");
print(df);

# DataFrame that supplies replacement values
frame2 = {"C1":[0, 0, 0, 0],
           "C2":[0, 0, 0, 0],
           "C3":[0, 0, 0, 0]
         };

repValues = pds.DataFrame(data=frame2);

# Replace values using a Boolean expression
replaced = df.mask(df > 1.0, other = repValues);
print("New DataFrame after replacement:");
print(replaced);

Output:

DataFrame:
    C1   C2   C3
0  1.0  1.0  1.1
1  1.0  1.1  1.0
2  1.1  1.1  1.0
3  1.0  1.0  1.0
New DataFrame after replacement:
    C1   C2   C3
0  1.0  1.0  0.0
1  1.0  0.0  1.0
2  0.0  0.0  1.0
3  1.0  1.0  1.0

Example 3 - Explicitly replacing a set of values through the replace() method:

# Example Python program that replaces
# a set of values with another set of values
# on a pandas DataFrame
import pandas as pds

# Create a DataFrame
matrix1 = [(1,2,3),
          (4,5,6),
          (7,8,9)];

df = pds.DataFrame(data=matrix1);
print("DataFrame:");
print(df);

# Replace row1 with all zeros
df1 = df.replace([1,2,3],[0,0,0]);
print("DataFrame after replacing the first row of values:");
print(df1);

Output:

DataFrame:
   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9
DataFrame after replacing the first row of values:
   0  1  2
0  0  0  0
1  4  5  6
2  7  8  9

Example 4 - Replacing the string elements of a DataFrame using a regular expression:

# Example Python program that uses a regular expression
# to replace the occurences of specific alphabets
import pandas as pds

tokens = [("The", "quick", "brown"),
          ("fox", "jumps", "over"),
          ("the", "lazy", "dog")];
df = pds.DataFrame(data = tokens);
print("DataFrame with string literals:");
print(df);

# Replace lower cases of "t" or "h" or "e" with an "a"
df1 = df.replace(regex = "[the]",
                    value = "a");
print("DataFrame received after replacing the possible elements using regex:");
print(df1);

Output:

DataFrame with string literals:
     0      1      2
0  The  quick  brown
1  fox  jumps   over
2  the   lazy    dog
DataFrame received after replacing the possible elements using regex:
     0      1      2
0  Taa  quick  brown
1  fox  jumps   ovar
2  aaa   lazy    dog

Example 5 - Replacing an element of a DataFrame using labels:

# Example Python program that replaces
# a DataFrame element using labels

import pandas as pds

# A list of tuples
values = [("a", "b", "c"),
          ("d", "e", "f"),
          ("g", "h", "i")];

# Create a DataFrame
df = pds.DataFrame(data = values,
                   columns = ("c1", "c2", "c3"),
                   index=("r1", "r2", "r3"));
print("DataFrame:");
print(df);

print("DataFrame after replacing a value through labels:");
df.loc["r2"]["c2"] = "x";
print(df);

Output:

DataFrame:
   c1 c2 c3
r1  a  b  c
r2  d  e  f
r3  g  h  i
DataFrame after replacing a value through labels:
   c1 c2 c3
r1  a  b  c
r2  d  x  f
r3  g  h  i

Example 6 - Replacing an element of a DataFrame using its indices:

# Example Python program that replaces
# a DataFrame element using its indices

import pandas as pds

# A list of tuples
values = [(10, 20, 30),
          (40, 50, 60),
          (70, 80, 90)];

# DataFrame creation
df = pds.DataFrame(data = values);
print("DataFrame:");
print(df);

print("DataFrame after replacing a value through its indices:");
df.iloc[1][1] = "-1";
print(df);

Output:

DataFrame:
    0   1   2
0  10  20  30
1  40  50  60
2  70  80  90
DataFrame after replacing a value through its indices:
    0   1   2
0  10  20  30
1  40  -1  60
2  70  80  90

 


Copyright 2024 © pythontic.com