Overview:
- A pandas DataFrame is a two-dimensional data structure that can hold heterogeneous Python objects.
- The pandas library uses numpy’s ndarray as the underlying storage for both pandas.DataFrame and pandas.Series classes.
- The HDF5 is a standard for storing multi-dimensional data in a hierarchical fashion. A HDF5 dataset can have up to 32 dimensions.
- A HDF5 file can have huge volumes of data contained in datasets organized into various groups with root group in the top, just like the UNIX file system.
- Similar to pandas, the HDF5 also uses the underlying storage as numpy ndarrays.
Exporting a pandas DataFrame to a HDF5 file:
- A HDF5 file is organized as various groups starting from /(root).
- The method to_hdf() exports a pandas DataFrame object to a HDF5 File.
- The HDF5 group under which the pandas DataFrame has to be stored is specified through the parameter key.
- The to_hdf() method internally uses the pytables library to store the DataFrame into a HDF5 file.
- The read_hdf() method reads a pandas object like DataFrame, Series from a HDF5 file.
Example:
# Example Python program that writes a pandas DataFrame # into a HDF5 file import pandas as pds
# Create a DataFrame for 3x3 matrix data = [(0.7, 0.6, 0.4), (0.5, 0.6, 0.5), (0.8, 0.5, 0.4)];
df = pds.DataFrame(data);
# Export the pandas DataFrame into HDF5 h5File = "fromdf.h5"; df.to_hdf(h5File, "/data/d1");
# Use pandas again to read data from the hdf5 file to the pandas DataFrame df1 = pds.read_hdf(h5File, "/data/d1"); print("DataFrame read from the HDF5 file through pandas:"); print(df1); |
Output:
DataFrame read from the HDF5 file through pandas: 0 1 2 0 0.7 0.6 0.4 1 0.5 0.6 0.5 2 0.8 0.5 0.4 |