Exporting a Pandas DataFrame into a HDF5 File

Overview:

  • A pandas DataFrame is a two-dimensional data structure that can hold heterogeneous Python objects.
  • The pandas library uses numpy’s ndarray as the underlying storage for both pandas.DataFrame and pandas.Series classes.
  • The HDF5 is a standard for storing multi-dimensional data in a hierarchical fashion. A HDF5 dataset can have up to 32 dimensions.
  • A HDF5 file can have huge volumes of data contained in datasets organized into various groups with root group in the top, just like the UNIX file system.
  • Similar to pandas, the HDF5 also uses the underlying storage as numpy ndarrays.

Exporting a pandas DataFrame to a HDF5 file:

  • A HDF5 file is organized as various groups starting from /(root).
  • The method to_hdf() exports a pandas DataFrame object to a HDF5 File.
  • The HDF5 group under which the pandas DataFrame has to be stored is specified through the parameter key.

Exporting a pandas DataFrame to a HDF5 file

  • The to_hdf() method internally uses the pytables library to store the DataFrame into a HDF5 file.
  • The read_hdf() method reads a pandas object like DataFrame, Series from a HDF5 file.

Example:

# Example Python program that writes a pandas DataFrame

# into a HDF5 file

import pandas as pds

 

# Create a DataFrame for 3x3 matrix

data = [(0.7, 0.6, 0.4),

        (0.5, 0.6, 0.5),

        (0.8, 0.5, 0.4)];

       

df = pds.DataFrame(data);

 

# Export the pandas DataFrame into HDF5

h5File = "fromdf.h5";

df.to_hdf(h5File, "/data/d1");

 

# Use pandas again to read data from the hdf5 file to the pandas DataFrame

df1 = pds.read_hdf(h5File, "/data/d1");

print("DataFrame read from the HDF5 file through pandas:");

print(df1);

 

Output:

DataFrame read from the HDF5 file through pandas:

     0    1    2

0  0.7  0.6  0.4

1  0.5  0.6  0.5

2  0.8  0.5  0.4


Copyright 2024 © pythontic.com