Select smallest and largest DataFrame rows based on column ordering

Overview:

The pandas DataFrame objects are used in several Data Analysis applications to store large amount of in-memory data akin to a table like structure.
Often pandas DataFrame instances are loaded from external feeds and external SQL repositories for analysis and computations.
DataFrames are often queried for top ‘n’ records or bottom ‘n’ records. The methods nsmallest() and nlargest() retrieve the smallest and largest ‘n’ records based on user defined column ordering.

Select specified number of largest rows from a DataFrame:

The Python code below loads data in Feather format into a pandas DataFrame.
Using the nlargest() function, the top three records of the DataFrame reporting higher round-trip times are printed onto the console.
The "columns" parameter is used to define the column ordering in the selected rows.
Using "ascending" parameter returned rows can be arranged in either increasing or decreasing order.

Example:

# Example Python program to find the ping records with

# longest response time

# Records are loaded from an external Feather file

import pandas as pd

import feather

# Load DataFile from Feather file

pingFile = "./serverpings.ftr";

dataFrame = pd.read_feather(pingFile, columns=None, use_threads=True);

print("Original DataFrame:");

print(dataFrame);

# Most delayed pings

delayedPings = dataFrame.nlargest(3, columns="roundtriptime");

print("DataFrame rows with 3 largest roundtriptime:");

print(type(delayedPings));

print(delayedPings);

Output:

Original DataFrame:

servername lastping roundtriptime status

0 svr_et_1 12.20.15.122 300 PASS

1 svr_et_2 12.20.11.395 400 PASS

2 svr_wt_1 12.20.12.836 0 FAIL

3 svr_wt_2 12.20.16.769 200 PASS

4 svr_nr_1 12.20.17.193 100 PASS

5 svr_nr_2 12.20.18.416 500 PASS

6 svr_st_1 11.59.55.913 350 PASS

7 svr_st_2 12.20.14.811 0 FAIL

DataFrame rows with 3 largest roundtriptime:

servername lastping roundtriptime status

5 svr_nr_2 12.20.18.416 500 PASS

1 svr_et_2 12.20.11.395 400 PASS

6 svr_st_1 11.59.55.913 350 PASS

Select specified number of smallest rows from a DataFrame:

The Python example below loads a set of rows from a MySQL server into a pandas DataFrame.
Using the nsmallest() function, DataFrame rows with the lowest customer feedback score are selected and printed onto the console.

Example:

# Example Python program to select a set of smallest records

# from a DataFrame

from sqlalchemy import create_engine

import pymysql

import pandas as pd

# Load MySQL table into a DataFrame

mySQLEngine = create_engine('mysql+pymysql://root:@127.0.0.1', pool_recycle=3600);

dbConnection = mySQLEngine.connect();

dataFrame = pd.read_sql("select * from test.uservitals", dbConnection);

print("Contents of the DataFrame:");

print(dataFrame);

# Select two user records with worst customer feedback

badFeedback = dataFrame.nsmallest(2, columns="AverageOrderRating");

print(badFeedback);

dbConnection.close()

Output:

Contents of the DataFrame:

index UserId UserFavourite MonthlyOrderFrequency ... LastOrderRating AverageOrderRating OrderMode InMedicalCare

0 0 xxxxx Greek Salad 5 ... 3 3 Web No

1 1 yyyyy Philly Cheese Steak 1 ... 3 4 App No

2 2 zzzzz Turkey Burger 2 ... 3 2 App No

3 3 aaaaa Crispy Orange Chicken 2 ... 2 1 App No

4 4 bbbbb Atlantic Salmon 7 ... 3 3 Web Yes

5 5 ccccc Pot roast 6 ... 2 4 Web No

6 6 ddddd Banana split 1 ... 4 3 App No

[7 rows x 10 columns]

index UserId UserFavourite MonthlyOrderFrequency ... LastOrderRating AverageOrderRating OrderMode InMedicalCare

3 3 aaaaa Crispy Orange Chicken 2 ... 2 1 App No

2 2 zzzzz Turkey Burger 2 ... 3 2 App No

[2 rows x 10 columns]

Select Smallest And Largest DataFrame Rows Based On Column Ordering

Overview:

Select specified number of largest rows from a DataFrame:

Example:

Output:

Select specified number of smallest rows from a DataFrame:

Example:

Output: