Printing the summary of a DataFrame object

Overview:

The pandas library and its data structures Series and the DataFrame are used extensively in various Data Analytics applications that process huge volumes of data.
Analytics are performed while keeping the DataFrames and other objects in-memory.
Often, a Python developer will be interested in knowing the statistics of the prime data structures involved in the analytics application.
The info() method prints a summary of information about a DataFrame object which include
- The class name of the DataFrame
- The RangeIndex of the DataFrame
- Number of Data Columns
- Number of non-null objects in each of the data column
- The data type of the DataFrame. In case when the DataFrame stores heterogeneous objects then object is returned as the type.
- The memory usage:
  - An estimate is given by default based on the column dtype
  - If the value of memory_usage parameter passed is ‘deep’ then the exact memory usage is returned.

# Example Python program that prints a summary

# of a pandas DataFrame

import pandas as pds

# Heterogenous objects

hetero = ([1, 1.1, "A"],

[(1, 2, 3), {"abc":1, "xyz":2}, True],

["Sky", None, "Blue"]);

# Create a DataFrame instance

dataFrame = pds.DataFrame(data=hetero);

print("DataFrame contents:");

print(dataFrame);

print("Summary of the DataFrame:");

print(dataFrame.info());

DataFrame contents:

0 1 2

0 1 1.1 A

1 (1, 2, 3) {'abc': 1, 'xyz': 2} True

2 Sky None Blue

Summary of the DataFrame:

RangeIndex: 3 entries, 0 to 2

Data columns (total 3 columns):

0 3 non-null object

1 2 non-null object

2 3 non-null object

dtypes: object(3)

memory usage: 200.0+ bytes

None

In the Python example Program above, when the memory_usage is changed to “deep”, the output has the following changes:

memory usage: 723.0 bytes