Overview:
- HDF5 is a hierarchical data model used for describing and defining data from sources that generate large volumes of data. Examples of sources that create huge datasets include sensors in a laboratory or factory, particle experiments, terrestrial and extra-terrestrial experiments and the similar kind.
- A HDF5 file can contain as many datasets as needed and is organized like the UNIX file system. From the root-level there could be many groups. The groups can have several datasets. Both the datasets and groups can have attributes the describe them.
- h5py is the python module that makes use of the HDF5 library(written in “C” programming language) and enables using HDF5 data models from Python programs.
- The class File from the h5py module is used for creating a HDF5 file.
- The created file can be specified of a name, mode, driver, version of the hdf5 library to be used, size of the user block which is added to the beginning of the file, single writer-multiple consumer mode, properties related to chunking like chunk cache size, chunk preemption policy, number of chunk slots in chunk cache and any keywords specific to the driver being used.
- The file mode specifies the operations for which the file is opened. The modes include: create for read and write, read only on an existing file(r), read and write on an existing file(r+), Create or overwrite (w), create only if there is no file with the same name (w-).
- A driver also can be specified while opening a HDF5 File. The type of driver determines the kind of storage facility needed for a HDF5 file. The storage could be fully on the physical memory with an optional write back to a disk file upon closing, a regular disk file similar to the file object opened by the Python open() function, a disk file with or without buffering or with limited buffering or a disk file with chunking enabled.
- The sec2 driver is the default driver. The sec2 driver creates a file on the disk with minimal buffering.
- Once a HDF5 file is created, datasets can be added to it through the method create_dataset() of the h5py.File class.
Example:
# Example Python program that creates volume datasets in HDF5 format # using the python module h5py import h5py import random
def getVal(): return random.randint(0, 1);
volumeFileName = "VolumeData.hdf5"; volumeFile = h5py.File(volumeFileName, "w");
# Create two volumes volumeShape = (1, 1, 1); v1 = volumeFile.create_dataset("s1", shape=volumeShape);
#volumeShape = (512, 512, 512); #v2 = volumeFile.create_dataset("s2", shape=volumeShape, );
# Fill the volume1 for x in range(0): for y in range(0): for z in range(0): v1[x, y, z] = 10;
print("Volume filling complete"); print("Driver:"); print(volumeFile.driver); |
Output:
Name of the HDF5 file:./One.hdf5 Name of the root group:/ Shape of the dataset s1: (10, 10, 10) 81.0 104.0 |