Create a simple HDF5 dataset using h5Py

Overview:

  • HDF5 is a hierarchical data model used for describing and defining data from sources that generate large volumes of data. Examples of sources that create huge datasets include sensors in a laboratory or factory, particle experiments, terrestrial and extra-terrestrial experiments and the similar kind.
  • A HDF5 file can contain as many datasets as needed and is organized like the UNIX file system. From the root-level there could be many groups. The groups can have several datasets. Both the datasets and groups can have attributes the describe them.
  • h5py is the python module that makes use of the HDF5 library(written in “C” programming language) and enables using HDF5 data models from Python programs.
  • The class File from the h5py module is used for creating a HDF5 file.
  • The created file can be specified of a name, mode, driver, version of the hdf5 library to be used, size of the user block which is added to the beginning of the file, single writer-multiple consumer mode, properties related to chunking like chunk cache size, chunk preemption policy, number of chunk slots in chunk cache and any keywords specific to the driver being used.
  • The file mode specifies the operations for which the file is opened. The modes include: create for read and write, read only on an existing file(r), read and write on an existing file(r+), Create or overwrite (w), create only if there is no file with the same name (w-).
  • A driver also can be specified while opening a HDF5 File. The type of driver determines the kind of storage facility needed for a HDF5 file. The storage could be fully on the physical memory with an optional write back to a disk file upon closing, a regular disk file similar to the file object opened by the Python open() function, a disk file with or without buffering or with limited buffering or a disk file with chunking enabled.
  • The sec2 driver is the default driver. The sec2 driver creates a file on the disk with minimal buffering.
  • Once a HDF5 file is created, datasets can be added to it through the method create_dataset() of the h5py.File class.

 

Example:

# Example Python program that creates volume datasets in HDF5 format

# using the python module h5py

import h5py

import random

 

def getVal():

    return random.randint(0, 1);

 

volumeFileName  = "VolumeData.hdf5";

volumeFile      = h5py.File(volumeFileName, "w");

 

# Create two volumes

volumeShape = (1, 1, 1);

v1 = volumeFile.create_dataset("s1", shape=volumeShape);

 

#volumeShape = (512, 512, 512);

#v2 = volumeFile.create_dataset("s2", shape=volumeShape, );

 

# Fill the volume1

for x in range(0):

    for y in range(0):

        for z in range(0):

            v1[x, y, z] = 10;

 

print("Volume filling complete");

print("Driver:");

print(volumeFile.driver);

 

Output:

Name of the HDF5 file:./One.hdf5

Name of the root group:/

Shape of the dataset s1:

(10, 10, 10)

81.0

104.0

 

 


Copyright 2024 © pythontic.com