Printing the summary of a DataFrame object

Overview:

  • The pandas library and its data structures Series and the DataFrame are used extensively in various Data Analytics applications that process huge volumes of data.
  • Analytics are performed while keeping the DataFrames and other objects in-memory.
  • Often, a Python developer will be interested in knowing the statistics of the prime data structures involved in the analytics application.
  • The info() method prints a summary of information about a DataFrame object which include
    • The class name of the DataFrame
    • The RangeIndex of the DataFrame
    • Number of Data Columns
    • Number of non-null objects in each of the data column
    • The data type of the DataFrame. In case when the DataFrame stores heterogeneous objects then object is returned as the type.
    • The memory usage:
      • An estimate is given by default based on the column dtype
      • If the value of memory_usage parameter passed is ‘deep’ then the exact memory usage is returned.

Example -  With an estimate of memory usage:

# Example Python program that prints a summary

# of a pandas DataFrame

import pandas as pds

 

# Heterogenous objects

hetero = ([1, 1.1, "A"],

          [(1, 2, 3), {"abc":1, "xyz":2}, True],

          ["Sky", None, "Blue"]);

 

# Create a DataFrame instance       

dataFrame = pds.DataFrame(data=hetero);

 

print("DataFrame contents:");

print(dataFrame);

 

print("Summary of the DataFrame:");

print(dataFrame.info());

 

 

Output:

DataFrame contents:

           0                     1     2

0          1                   1.1     A

1  (1, 2, 3)  {'abc': 1, 'xyz': 2}  True

2        Sky                  None  Blue

Summary of the DataFrame:

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3 entries, 0 to 2

Data columns (total 3 columns):

0    3 non-null object

1    2 non-null object

2    3 non-null object

dtypes: object(3)

memory usage: 200.0+ bytes

None

 

In the Python example Program above, when the memory_usage is changed to “deep”, the output has the following changes:

memory usage: 723.0 bytes

 


Copyright 2024 © pythontic.com