- There are several container mechanisms available in the Python ecosystem for storing and manipulating multiple elements. These mechanisms store elements from basic types like bool to class objects in the main memory of a computer system. They include arrays - as provided by the Python standard library implementation, ndarrays of the numpy library, specialized containers that are built on top of ndarrays like Series and DataFrame classes of the pandas library.
- During the lifetime of a Python program, these container instances can grow significantly to the extent that the available main memory becomes lesser and lesser and the scenario imposes a performance bottleneck.
- Based on the sheer size of the data under consideration methods like cluster computing can horizontally scale the scope of the Python programs.
- Regardless of whether Python program(s) run(s) in a computing cluster or in a single system only, it is essential to measure the amount of memory consumed by the major data structures like a pandas DataFrame.
- With the method memory_usage() of the DataFrame class the column-wise memory consumption of a DataFrame instance can be calculated.
# Example Python program that computes the memory
# Read a CSV file downloaded from kaggle
print("Memory consumption of each DataFrame column in bytes:");
|Memory consumption of each DataFrame column in bytes:
Memory consumption of the DataFrame instance in bytes:16366427 bytes
Memory consumption in megabytes(MB): 15.61 MB