Overview:
- A DataFrame is organized as a set of rows and columns identified by the row index/row labels and column index/column labels.
- In Data Analysis, it is a frequent requirement to sort the DataFrame contents based on their values, either column-wise or row-wise. Also, it is a common requirement to sort a DataFrame by row index or column index.
Sorting the contents of a DataFrame by values:
- The sort_values() function sorts a DataFrame by columns or by rows based on the value of its elements and returns the sortedDataFrame as a new DataFrame instance.
- Either rows are sorted (axis=0) or columns are sorted (axis=1). Column ordering and row ordering is to be specified through the parameter “by”.
- The parameter "ascending" controls whether the sorted DataFrame contents are in the increasing order or in the decreasing order.
- The sorting method can be decided through the parameter kind – which allows one of the values among "quicksort", "mergesort", "heapsort" to be selected.
Example – Sorting the DataFrame contents by row values:
# Example Python program to sort the contents # of a DataFrame by its values along axis=0(row-wise) import pandas as pd
contents = [(1,3,8,9), (17,12,5,4), (3,13,8,19), (3,11,9,7) ];
dataFrame = pd.DataFrame(data=contents); dataFrame.columns = ["a","b","c","d"];
print("Contents of the DataFrame:"); print(dataFrame);
print("Contents of the DataFrame:After sorting rows based on columns a and b"); print(dataFrame.sort_values(by=["a", "b"])); |
Output:
Contents of the DataFrame: a b c d 0 1 3 8 9 1 17 12 5 4 2 3 13 8 19 3 3 11 9 7 Contents of the DataFrame:After sorting rows based on columns a and b a b c d 0 1 3 8 9 3 3 11 9 7 2 3 13 8 19 1 17 12 5 4
|
Example - Sorting the DataFrame contents by column values:
# Example Python program to sort the contents # of a DataFrame by its values along axis=1(column-wise) import pandas as pd
contents = [("x","a","g","l"), ("o","t","o","b"), ("r","l","v","w"), ("x","a","z","d") ];
dataFrame = pd.DataFrame(data=contents, index=("r1","r2","r3","r4")); dataFrame.columns = ["C1","C2","C3","C4"];
print("Contents of the DataFrame:"); print(dataFrame);
print("Contents of the DataFrame:After sorting columns based on rows r2 and r3"); print(dataFrame.sort_values(by=["r2","r3"], axis=1));
|
Output:
Contents of the DataFrame: C1 C2 C3 C4 r1 x a g l r2 o t o b r3 r l v w r4 x a z d Contents of the DataFrame:After sorting columns based on rows r2 and r3 C4 C1 C3 C2 r1 l x g a r2 b o o t r3 w r v l r4 d x z a
|
Sorting DataFrame contents based on DataFrame index:
- Loading data into a DataFrame is typically done from external sources like a database or spreadsheet or a text file with fields separated by a delimiter( comma, pipe, tab and so on).
- While loading a DataFrame from external data sources, often it is possible that the incoming data - rows or columns are not sorted based on row ids or column ids.
- A DataFrame has two indexes – one for the rows and one for the columns. A DataFrame can be sorted either by its column index or row index.
Example: Sorting the DataFrame contents based on row index i.e.,Axis=0
- This Python example code sorts the contents of the DataFrame based on row-index.
# Python example program to sort a pandas DataFrame # using its row index (axis=0) import pandas as pd
# Data as list of tuples data = [(1, 1, 2, 3), (1, 4, 9, 16), (1, 5, 12, 22), (1, 6, 15, 28), (1, 11, 21, 1211)];
# Create a pandas DataFrame with row index df = pd.DataFrame(data, index = (2,1,0,4,3), columns=("a","b","c","d"));
print("Contents of the DataFrame:"); print(df);
print("Contents of the DataFrame after sorting based on row index:"); print(df.sort_index(axis=0)); |
Output:
Contents of the DataFrame: a b c d 2 1 1 2 3 1 1 4 9 16 0 1 5 12 22 4 1 6 15 28 3 1 11 21 1211 Contents of the DataFrame after sorting based on row index: a b c d 0 1 5 12 22 1 1 4 9 16 2 1 1 2 3 3 1 11 21 1211 4 1 6 15 28
|
Example: Sorting the DataFrame contents based on column index i.e.,Axis=1
- The Python example code below sorts the contents of the DataFrame based on column-index.
# Python example program to sort a pandas DataFrame # using its column index (axis=1) import pandas as pd
# Data as python dictionary data = {"d":["m","n","o","p"], "a":["a","b","c","d"], "d":["i","j","k","l"], "c":["e","f","g","h"]};
# Create a pandas DataFrame with column index df = pd.DataFrame(data);
print("Contents of the DataFrame:"); print(df);
print("Contents of the DataFrame after sorting based on column index:"); print(df.sort_index(axis=1)); |
Output:
Contents of the DataFrame: d a c 0 i a e 1 j b f 2 k c g 3 l d h Contents of the DataFrame after sorting based on column index: a c d 0 a e i 1 b f j 2 c g k 3 d h l
|