Select smallest and largest DataFrame rows based on column ordering

Overview:

  • The pandas DataFrame objects are used in several Data Analysis applications to store large amount of in-memory data akin to a table like structure.
  • Often pandas DataFrame instances are loaded from external feeds and external SQL repositories for analysis and computations.
  • DataFrames are often queried for top ‘n’ records or bottom ‘n’ records. The methods nsmallest() and nlargest() retrieve the smallest and largest ‘n’ records based on user defined column ordering.

Select specified number of largest rows from a DataFrame:

  • The Python code below loads data in Feather format into a pandas DataFrame.
  • Using the nlargest() function, the top three records of the DataFrame reporting higher round-trip times are printed onto the console.
  • The "columns" parameter is used to define the column ordering in the selected rows.
  • Using "ascending" parameter returned rows can be arranged in either increasing or decreasing order.

Example:

# Example Python program to find the ping records with

# longest response time

# Records are loaded from an external Feather file

import pandas as pd

import feather

 

# Load DataFile from Feather file

pingFile = "./serverpings.ftr";

dataFrame = pd.read_feather(pingFile, columns=None, use_threads=True);

print("Original DataFrame:");

print(dataFrame);

 

# Most delayed pings

delayedPings = dataFrame.nlargest(3, columns="roundtriptime");

print("DataFrame rows with 3 largest roundtriptime:");

print(type(delayedPings));

print(delayedPings);

 

Output:

Original DataFrame:

  servername      lastping  roundtriptime status

0   svr_et_1  12.20.15.122            300   PASS

1   svr_et_2  12.20.11.395            400   PASS

2   svr_wt_1  12.20.12.836              0   FAIL

3   svr_wt_2  12.20.16.769            200   PASS

4   svr_nr_1  12.20.17.193            100   PASS

5   svr_nr_2  12.20.18.416            500   PASS

6   svr_st_1  11.59.55.913            350   PASS

7   svr_st_2  12.20.14.811              0   FAIL

DataFrame rows with 3 largest roundtriptime:

<class 'pandas.core.frame.DataFrame'>

  servername      lastping  roundtriptime status

5   svr_nr_2  12.20.18.416            500   PASS

1   svr_et_2  12.20.11.395            400   PASS

6   svr_st_1  11.59.55.913            350   PASS

 

 

Select specified number of smallest rows from a DataFrame:

 

Example:

# Example Python program to select a set of smallest records 

# from a DataFrame

from sqlalchemy import create_engine

import pymysql

import pandas as pd

 

# Load MySQL table into a DataFrame

mySQLEngine     = create_engine('mysql+pymysql://root:@127.0.0.1', pool_recycle=3600);

dbConnection    = mySQLEngine.connect();

dataFrame       = pd.read_sql("select * from test.uservitals", dbConnection);

 

print("Contents of the DataFrame:");

print(dataFrame);

 

# Select two user records with worst customer feedback

badFeedback = dataFrame.nsmallest(2, columns="AverageOrderRating");

print(badFeedback);

 

dbConnection.close()

 

 

Output:

Contents of the DataFrame:

   index UserId          UserFavourite  MonthlyOrderFrequency  ...  LastOrderRating  AverageOrderRating  OrderMode  InMedicalCare

0      0  xxxxx            Greek Salad                      5  ...                3                   3        Web             No

1      1  yyyyy    Philly Cheese Steak                      1  ...                3                   4        App             No

2      2  zzzzz          Turkey Burger                      2  ...                3                   2        App             No

3      3  aaaaa  Crispy Orange Chicken                      2  ...                2                   1        App             No

4      4  bbbbb        Atlantic Salmon                      7  ...                3                   3        Web            Yes

5      5  ccccc              Pot roast                      6  ...                2                   4        Web             No

6      6  ddddd           Banana split                      1  ...                4                   3        App             No

 

[7 rows x 10 columns]

   index UserId          UserFavourite  MonthlyOrderFrequency  ...  LastOrderRating  AverageOrderRating  OrderMode  InMedicalCare

3      3  aaaaa  Crispy Orange Chicken                      2  ...                2                   1        App             No

2      2  zzzzz          Turkey Burger                      2  ...                3                   2        App             No

 

[2 rows x 10 columns]

 

 

 


Copyright 2024 © pythontic.com