Overview:
- The pandas DataFrame objects are used in several Data Analysis applications to store large amount of in-memory data akin to a table like structure.
- Often pandas DataFrame instances are loaded from external feeds and external SQL repositories for analysis and computations.
- DataFrames are often queried for top ‘n’ records or bottom ‘n’ records. The methods nsmallest() and nlargest() retrieve the smallest and largest ‘n’ records based on user defined column ordering.
Select specified number of largest rows from a DataFrame:
- The Python code below loads data in Feather format into a pandas DataFrame.
- Using the nlargest() function, the top three records of the DataFrame reporting higher round-trip times are printed onto the console.
- The "columns" parameter is used to define the column ordering in the selected rows.
- Using "ascending" parameter returned rows can be arranged in either increasing or decreasing order.
Example:
# Example Python program to find the ping records with # longest response time # Records are loaded from an external Feather file import pandas as pd import feather
# Load DataFile from Feather file pingFile = "./serverpings.ftr"; dataFrame = pd.read_feather(pingFile, columns=None, use_threads=True); print("Original DataFrame:"); print(dataFrame);
# Most delayed pings delayedPings = dataFrame.nlargest(3, columns="roundtriptime"); print("DataFrame rows with 3 largest roundtriptime:"); print(type(delayedPings)); print(delayedPings); |
Output:
Original DataFrame: servername lastping roundtriptime status 0 svr_et_1 12.20.15.122 300 PASS 1 svr_et_2 12.20.11.395 400 PASS 2 svr_wt_1 12.20.12.836 0 FAIL 3 svr_wt_2 12.20.16.769 200 PASS 4 svr_nr_1 12.20.17.193 100 PASS 5 svr_nr_2 12.20.18.416 500 PASS 6 svr_st_1 11.59.55.913 350 PASS 7 svr_st_2 12.20.14.811 0 FAIL DataFrame rows with 3 largest roundtriptime: <class 'pandas.core.frame.DataFrame'> servername lastping roundtriptime status 5 svr_nr_2 12.20.18.416 500 PASS 1 svr_et_2 12.20.11.395 400 PASS 6 svr_st_1 11.59.55.913 350 PASS |
Select specified number of smallest rows from a DataFrame:
- The Python example below loads a set of rows from a MySQL server into a pandas DataFrame.
- Using the nsmallest() function, DataFrame rows with the lowest customer feedback score are selected and printed onto the console.
Example:
# Example Python program to select a set of smallest records # from a DataFrame from sqlalchemy import create_engine import pymysql import pandas as pd
# Load MySQL table into a DataFrame mySQLEngine = create_engine('mysql+pymysql://root:@127.0.0.1', pool_recycle=3600); dbConnection = mySQLEngine.connect(); dataFrame = pd.read_sql("select * from test.uservitals", dbConnection);
print("Contents of the DataFrame:"); print(dataFrame);
# Select two user records with worst customer feedback badFeedback = dataFrame.nsmallest(2, columns="AverageOrderRating"); print(badFeedback);
dbConnection.close() |
Output:
Contents of the DataFrame: index UserId UserFavourite MonthlyOrderFrequency ... LastOrderRating AverageOrderRating OrderMode InMedicalCare 0 0 xxxxx Greek Salad 5 ... 3 3 Web No 1 1 yyyyy Philly Cheese Steak 1 ... 3 4 App No 2 2 zzzzz Turkey Burger 2 ... 3 2 App No 3 3 aaaaa Crispy Orange Chicken 2 ... 2 1 App No 4 4 bbbbb Atlantic Salmon 7 ... 3 3 Web Yes 5 5 ccccc Pot roast 6 ... 2 4 Web No 6 6 ddddd Banana split 1 ... 4 3 App No
[7 rows x 10 columns] index UserId UserFavourite MonthlyOrderFrequency ... LastOrderRating AverageOrderRating OrderMode InMedicalCare 3 3 aaaaa Crispy Orange Chicken 2 ... 2 1 App No 2 2 zzzzz Turkey Burger 2 ... 3 2 App No
[2 rows x 10 columns]
|