Knowing the memory usage of DataFrame columns in pandas

Overview:

  • There are several container mechanisms available in the Python ecosystem for storing and manipulating multiple elements. These mechanisms store elements from basic types like bool to class objects in the main memory of a computer system. They include arrays - as provided by the Python standard library implementation, ndarrays of the numpy library, specialized containers that are built on top of ndarrays like Series and DataFrame classes of the pandas library.
  • During the lifetime of a Python program, these container instances can grow significantly to the extent that the available main memory becomes lesser and lesser and the scenario imposes a performance bottleneck.
  • Based on the sheer size of the data under consideration methods like cluster computing can horizontally scale the scope of the Python programs.
  • Regardless of whether Python program(s) run(s) in a computing cluster or in a single system only, it is essential to measure the amount of memory consumed by the major data structures like a pandas DataFrame.
  • With the method memory_usage() of the DataFrame class the column-wise memory consumption of a DataFrame instance can be calculated.

Example:

# Example Python program that computes the memory
# usage of its pandas DataFrame instances
import pandas as pds
import numpy as np

# Read a CSV file downloaded from kaggle
# under CC BY-SA 4.0, into a pandas DataFrame
redditPosts        = "/data/downloads/r_dataisbeautiful_posts.csv";
postFrame       = pds.read_csv(redditPosts, low_memory=False);
memStats        = postFrame.memory_usage();

print("Memory consumption of each DataFrame column in bytes:");
print(memStats);
print("Memory consumption of the DataFrame instance in bytes:%d bytes"%(memStats.sum()));
print("Memory consumption in megabytes(MB): %2.2f MB"%(memStats/1024/1024).sum());

Output:

Memory consumption of each DataFrame column in bytes:
Index                        128
id                       1471128
title                    1471128
score                    1471128
author                   1471128
author_flair_text        1471128
removed_by               1471128
total_awards_received    1471128
awarders                 1471128
created_utc              1471128
full_link                1471128
num_comments             1471128
over_18                   183891
dtype: int64
Memory consumption of the DataFrame instance in bytes:16366427 bytes
Memory consumption in megabytes(MB): 15.61 MB

 


Copyright 2024 © pythontic.com