Overview:
- The pandas library and its data structures Series and the DataFrame are used extensively in various Data Analytics applications that process huge volumes of data.
- Analytics are performed while keeping the DataFrames and other objects in-memory.
- Often, a Python developer will be interested in knowing the statistics of the prime data structures involved in the analytics application.
- The info() method prints a summary of information about a DataFrame object which include
- The class name of the DataFrame
- The RangeIndex of the DataFrame
- Number of Data Columns
- Number of non-null objects in each of the data column
- The data type of the DataFrame. In case when the DataFrame stores heterogeneous objects then object is returned as the type.
- The memory usage:
- An estimate is given by default based on the column dtype
- If the value of memory_usage parameter passed is ‘deep’ then the exact memory usage is returned.
Example - With an estimate of memory usage:
# Example Python program that prints a summary # of a pandas DataFrame import pandas as pds
# Heterogenous objects hetero = ([1, 1.1, "A"], [(1, 2, 3), {"abc":1, "xyz":2}, True], ["Sky", None, "Blue"]);
# Create a DataFrame instance dataFrame = pds.DataFrame(data=hetero);
print("DataFrame contents:"); print(dataFrame);
print("Summary of the DataFrame:"); print(dataFrame.info()); |
Output:
DataFrame contents: 0 1 2 0 1 1.1 A 1 (1, 2, 3) {'abc': 1, 'xyz': 2} True 2 Sky None Blue Summary of the DataFrame: <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 3 columns): 0 3 non-null object 1 2 non-null object 2 3 non-null object dtypes: object(3) memory usage: 200.0+ bytes None |
In the Python example Program above, when the memory_usage is changed to “deep”, the output has the following changes:
memory usage: 723.0 bytes |