Getting the ndarray of a pandas DataFrame

Overview:

The Pandas framework makes the data processing easier by making one simple assumption. That is, the majority of the data which gets processed by manmade systems in current times are one-dimensional or two-dimensional. There could be data being produced that are higher in dimensions or data needing abstraction of higher dimensions. But the two abstractions that are commonly used are one-dimensional data and two-dimensional data.

By having the classes Series and DataFrame the pandas Framework gives versatile and strong capabilities to process these one-dimensional and two-dimensional data. When in it comes to the volume of data, these one-dimensional and two-dimensional data could be of huge sizes.

To support such huge volumes of data, the series and the DataFrame classes have been built with numpy's ndarray as their underlying data structure. A developer is free to get access to the underlying ndarray, through the DataFrame attribute values or through the method to_numpy().

Example:

# Example Python program that prints the underlying ndarray
# of pandas DataFrame
import pandas as pds

# Binary data
bits = [(0, 0, 0, 0, 0, 0, 0, 0),
        (1, 0, 0, 0, 0, 0, 0, 1),
        (1, 1, 0, 0, 0, 0, 1, 1),
        (1, 1, 1, 0, 0, 1, 1, 1),
        (1, 1, 1, 1, 1, 1, 1, 1),
        (1, 1, 1, 0, 0, 1, 1, 1),
        (1, 1, 0, 0, 0, 0, 1, 1),
        (0, 0, 0, 0, 0, 0, 0, 0)
        ];

# Make a DataFrame from binary data        
bitFrame    = pds.DataFrame(data=bits);

# Get the underlying ndarray
bitCol      = bitFrame.to_numpy();
print("The underlying numpy array:");
print(bitCol);
print(type(bitCol));

# Know the buffer start location of this ndarray
print("The buffer location:%s"%bitCol.data);

# Check whether to_numpy() just returned a view
print("The buffer location in the original DataFrame:%s"%bitFrame.values.data);

print("Memory layout information:");
print(bitCol.flags);

Outlook:

The underlying numpy array:

[[0 0 0 0 0 0 0 0]

 [1 0 0 0 0 0 0 1]

 [1 1 0 0 0 0 1 1]

 [1 1 1 0 0 1 1 1]

 [1 1 1 1 1 1 1 1]

 [1 1 1 0 0 1 1 1]

 [1 1 0 0 0 0 1 1]

 [0 0 0 0 0 0 0 0]]

<class 'numpy.ndarray'>

The buffer location:<memory at 0x10f0551e0>

The buffer location in the original DataFrame:<memory at 0x10f0551e0>

Memory layout information:

  C_CONTIGUOUS : False

  F_CONTIGUOUS : True

  OWNDATA : False

  WRITEABLE : True

  ALIGNED : True

  WRITEBACKIFCOPY : False

  UPDATEIFCOPY : False

 


Copyright 2024 © pythontic.com