Sorting the contents of a pandas DataFrame in Python

Overview:

  • A DataFrame is organized as a set of rows and columns identified by the row index/row labels and column index/column labels.
  • In Data Analysis, it is a frequent requirement to sort the DataFrame contents based on their values, either column-wise or row-wise. Also, it is a common requirement to sort a DataFrame by row index or column index.

Sorting the contents of a DataFrame by values:

  • The sort_values() function sorts a DataFrame by columns or by rows based on the value of its elements and returns the sortedDataFrame as a new DataFrame instance.
  • Either rows are sorted (axis=0) or columns are sorted (axis=1). Column ordering and row ordering is to be specified through the parameter “by”.
  • The parameter "ascending" controls whether the sorted DataFrame contents are in the increasing order or in the decreasing order.
  • The sorting method can be decided through the parameter kind – which allows one of the values among "quicksort", "mergesort", "heapsort" to be selected.

 

Example – Sorting the DataFrame contents by row values:

# Example Python program to sort the contents

# of a DataFrame by its values along axis=0(row-wise)

import pandas as pd

 

contents = [(1,3,8,9),

            (17,12,5,4),

            (3,13,8,19),

            (3,11,9,7)

           ];

 

dataFrame = pd.DataFrame(data=contents);

dataFrame.columns = ["a","b","c","d"];

 

print("Contents of the DataFrame:");

print(dataFrame);

 

print("Contents of the DataFrame:After sorting rows based on columns a and b");

print(dataFrame.sort_values(by=["a", "b"]));

 

Output:

Contents of the DataFrame:

    a   b  c   d

0   1   3  8   9

1  17  12  5   4

2   3  13  8  19

3   3  11  9   7

Contents of the DataFrame:After sorting rows based on columns a and b

    a   b  c   d

0   1   3  8   9

3   3  11  9   7

2   3  13  8  19

1  17  12  5   4

 

 

Example -  Sorting the DataFrame contents by column values:

# Example Python program to sort the contents

# of a DataFrame by its values along axis=1(column-wise)

import pandas as pd

 

contents = [("x","a","g","l"),

            ("o","t","o","b"),

            ("r","l","v","w"),

            ("x","a","z","d")

           ];

 

dataFrame = pd.DataFrame(data=contents, index=("r1","r2","r3","r4"));

dataFrame.columns = ["C1","C2","C3","C4"];

 

print("Contents of the DataFrame:");

print(dataFrame);

 

print("Contents of the DataFrame:After sorting columns based on rows r2 and r3");

print(dataFrame.sort_values(by=["r2","r3"], axis=1));

 

 

Output:

Contents of the DataFrame:

   C1 C2 C3 C4

r1  x  a  g  l

r2  o  t  o  b

r3  r  l  v  w

r4  x  a  z  d

Contents of the DataFrame:After sorting columns based on rows r2 and r3

   C4 C1 C3 C2

r1  l  x  g  a

r2  b  o  o  t

r3  w  r  v  l

r4  d  x  z  a

 

 

 

Sorting DataFrame contents based on DataFrame index:

  • Loading data into a DataFrame is typically done from external sources like a database or spreadsheet or a text file with fields separated by a delimiter( comma, pipe, tab and so on).
  • While loading a DataFrame from external data sources, often it is possible that the incoming data - rows or columns are not sorted based on row ids or column ids.
  • A DataFrame has two indexes – one for the rows and one for the columns. A DataFrame can be sorted either by its column index or row index.

Example: Sorting the DataFrame contents based on row index i.e.,Axis=0

  • This Python example code sorts the contents of the DataFrame based on row-index.

# Python example program to sort a pandas DataFrame

# using its row index (axis=0)

import pandas as pd

 

# Data as list of tuples

data = [(1,  1,  2,    3),

        (1,  4,  9,   16),

        (1,  5, 12,   22),

        (1,  6, 15,   28),

        (1, 11, 21, 1211)];

 

# Create a pandas DataFrame with row index

df   = pd.DataFrame(data, index = (2,1,0,4,3), columns=("a","b","c","d"));

 

print("Contents of the DataFrame:");

print(df);

 

print("Contents of the DataFrame after sorting based on row index:");

print(df.sort_index(axis=0));

 

Output:

Contents of the DataFrame:

   a   b   c     d

2  1   1   2     3

1  1   4   9    16

0  1   5  12    22

4  1   6  15    28

3  1  11  21  1211

Contents of the DataFrame after sorting based on row index:

   a   b   c     d

0  1   5  12    22

1  1   4   9    16

2  1   1   2     3

3  1  11  21  1211

4  1   6  15    28

 

 

Example: Sorting the DataFrame contents based on column index i.e.,Axis=1

  • The Python example code below sorts the contents of the DataFrame based on column-index.

# Python example program to sort a pandas DataFrame

# using its column index (axis=1)

import pandas as pd

 

# Data as python dictionary

data = {"d":["m","n","o","p"],

        "a":["a","b","c","d"],

        "d":["i","j","k","l"],    

        "c":["e","f","g","h"]};

 

# Create a pandas DataFrame with column index

df   = pd.DataFrame(data);

 

print("Contents of the DataFrame:");

print(df);

 

print("Contents of the DataFrame after sorting based on column index:");

print(df.sort_index(axis=1));

 

 

Output:

Contents of the DataFrame:

   d  a  c

0  i  a  e

1  j  b  f

2  k  c  g

3  l  d  h

Contents of the DataFrame after sorting based on column index:

   a  c  d

0  a  e  i

1  b  f  j

2  c  g  k

3  d  h  l

 

 


Copyright 2024 © pythontic.com