Overview:
- A pandas DataFrame is a 2-dimensional, heterogeneous container built using ndarray as the underlying.
- It is often required in data processing to remove unwanted rows and/or columns from DataFrame and to create new DataFrame from the resultant Data.
Remove rows and columns of DataFrame using drop():
- Specific rows and columns can be removed from a DataFrame object using the drop() instance method.
- The drop method can be specified of an axis – 0 for columns and 1 for rows. Similar to axis the parameter, index can be used for specifying rows and columns can be used for specifying columns.
Remove rows or columns of DataFrame using truncate():
- The truncate() method removes rows or columns at before-1 and after+1 positions. The before and after are parameters of the truncate() method that specify the thresholds of indices using which the rows or columns are discarded before a new DataFrame is returned.
- By default truncate() removes rows. If columns need to be removed, axis="columns" can be specified while calling the method.
- In the similar way, on a Series instance which is a one dimensional container, the truncate() and drop() methods of the pandas Series class can be used.
Example – Remove rows from a DataFrame using drop():
# Example Python program that removes specific rows from # a pandas DataFrame instance import pandas as pds
# A dictionary of cities vs some measurement subjectVsScores = {"City1":[78, 45, 50, 85, 63, 72, 40, 78, 60, 69], "City2":[52, 67, 81, 65, 83, 42, 76, 54, 73, 81], "City3":[67, 55, 81, 62, 58, 71, 57, 45, 49, 51] }
# Create a dataframe using the dictionary dataFrame = pds.DataFrame(subjectVsScores); print(dataFrame);
# Drop data for City2 from the DataFrame newFrame = dataFrame.drop("City2", axis=1); print("New DataFrame after removal of a row:"); print(newFrame); |
Output:
City1 City2 City3 0 78 52 67 1 45 67 55 2 50 81 81 3 85 65 62 4 63 83 58 5 72 42 71 6 40 76 57 7 78 54 45 8 60 73 49 9 69 81 51 New DataFrame after removal of a row: City1 City3 0 78 67 1 45 55 2 50 81 3 85 62 4 63 58 5 72 71 6 40 57 7 78 45 8 60 49 9 69 51 |
Example – Removing columns from a pandas DataFrame using drop():
# Example Python program to drop columns # from a pandas DataFrame instance using # drop() method import pandas as pds
# A matrix as list of tuples matrix = [(0, 20, 30), (40, 0, 60), (70, 80, 0)];
# Create a DataFrame instance df = pds.DataFrame(matrix);
# Remove 1st and 3rd columns newdf1 = df.drop([0, 2], axis=1); print("New DataFrame after removing 1st and 3rd columns:"); print(newdf1);
# Remove 2nd column newdf2 = df.drop(columns=1); print("New DataFrame after removing 2nd column:"); print(newdf2);
|
Output:
New DataFrame after removing 1st and 3rd columns: 1 0 20 1 0 2 80 New DataFrame after removing 2nd column: 0 2 0 0 30 1 40 60 2 70 0 |
Example – Removing rows and columns from a pandas DataFrame using drop:
# Example Python program that removes rows and columns # from a pandas DataFrame
import pandas as pds
dataMatrix = [(0.6, 0.7, 0.2), (0.1, 0.4, 0.5), (0.3, 0.9, 0.2), (0.6, 0.8, 0.1)];
dataFrame = pds.DataFrame(dataMatrix, columns = ["E1", "E2", "E3"], index = ["E5", "E6", "E7", "E8"]);
print("Original DataFrame:"); print(dataFrame);
# Remove 2nd row and 3rd column to produce a new DataFrame newdf = dataFrame.drop(index="E6", columns="E3"); print("New DataFrame obtained by removing 2nd row and 3rd column:"); print(newdf); |
Output:
Original DataFrame: E1 E2 E3 E5 0.6 0.7 0.2 E6 0.1 0.4 0.5 E7 0.3 0.9 0.2 E8 0.6 0.8 0.1 New DataFrame obtained by removing 2nd row and 3rd column: E1 E2 E5 0.6 0.7 E7 0.3 0.9 E8 0.6 0.8 |
Example - Removing rows using a threshold on the DataFrame indices with truncate():
# Example Python program that truncates rows of a pandas DataFrame using truncate() import pandas as pds
matrix = [(0, 1, 3), (1, 2, 1), (2, 3, 0)];
df = pds.DataFrame(matrix, columns=["a", "b", "c"]); newdf = df.truncate(before=1, after=1); print("Contents of the DataFrame:"); print(df);
print("Contents of the DataFrame after removing rows using truncate():"); print(newdf); |
Output:
Contents of the DataFrame: a b c 0 0 1 3 1 1 2 1 2 2 3 0 Contents of the DataFrame after removing rows using truncate(): a b c 1 1 2 1 |
Example – Removal of DataFrame columns using truncate():
# Example Python program that removes columns from a # pandas DataFrame instance using the method truncate() import pandas as pds
data = [(5, 0, 5, 5), (10, 1, 10, 2), (15, 10, 5, 10)];
# Create a DataFrame instance dataFrameObject = pds.DataFrame(data); print("Rows and columns of the original DataFrame:"); print(dataFrameObject);
# Remove columns and print the new DataFrame newDataFrame = dataFrameObject.truncate(1,2,axis="columns"); print("DataFrame contents after removal of columns using truncate():"); print(newDataFrame); |
Output:
Rows and columns of the original DataFrame: 0 1 2 3 0 5 0 5 5 1 10 1 10 2 2 15 10 5 10 DataFrame contents after removal of columns using truncate(): 1 2 0 0 5 1 1 10 2 10 5 |