Create groups in a HDF5 file using H5Py

Overview:

  • HDF5 is a specification and format for creating hierarchical data from very large data sources.
  • In HDF5 the data is organized in a file. The file object acts as the / (root) group of the hierarchy. Similar to the UNIX file system, in HDF5 the datasets and their groups are organized as an inverted tree.
  • Several groups can be created under the / (root) group. A group can have one or more datasets and other groups (Self referencing links are possible as well.)
  • A group at any level below the / (root) group in a HDF5 file is created using the method create_group() on a h5py File instance;
  • The members in the group can be accessed using
    • Dictionary notation – parentobject[“childname”], parentobject [“level1/leveln/childname”]
    • parentobject.get(“childname”)
  • Datasets inside the groups can be created using
    • parentobject[“datasetname”]
    • parentobject.create_dataset()
  • A group’s parent is given by the attribute, parent of the group.

 

Example:

# Example Python program that creates a hierarchy of groups

# and datasets in a HDF5 file using h5py

import h5py

import random

import numpy.random

 

# Create a HDF5 file

hierarchicalFileName  = "Hierarchical.hdf5";

hierarchicalFile      = h5py.File(hierarchicalFileName, "w");

 

# Create a group under root

grp1 = hierarchicalFile.create_group("Group1");

grp2 = grp1.create_group("Group2");

grp3 = grp2.create_group("Group3");

 

# Use POSIX path to create a hierarchy of group under root

grp4 = hierarchicalFile.create_group("/GroupA/GroupB/GroupC");

 

# Use dictionary notation to create a dataset inside a group

grp4["D1"] = 1;

 

# Create another dataset inside the same group

datasetShape = (10,2);

d2 = grp4.create_dataset("D2", datasetShape);

 

# Print the groups

print(hierarchicalFile["/"]);

print(grp1);

print(grp2);

print(grp3);

print(grp4);

print(grp4["D1"]);

print(grp4["D1"].parent);

print(grp4["D2"]);

print(grp4["D2"].parent);

 

# Add value to the D2 dataset

for a in range(0, d2.shape[0]):

    for b in range(0, d2.shape[1]):

        d2[a, b] = numpy.random.uniform(1, 1000, 1)[0];

 

print("Data from d2:");

# Read value from the dataset       

for x in range(0, 10):

    for y in range(0, 2):

        print(d2[x, y]);

 

Output:

<HDF5 group "/" (2 members)>

<HDF5 group "/Group1" (1 members)>

<HDF5 group "/Group1/Group2" (1 members)>

<HDF5 group "/Group1/Group2/Group3" (0 members)>

<HDF5 group "/GroupA/GroupB/GroupC" (2 members)>

<HDF5 dataset "D1": shape (), type "<i8">

<HDF5 group "/GroupA/GroupB/GroupC" (2 members)>

<HDF5 dataset "D2": shape (10, 2), type "<f4">

<HDF5 group "/GroupA/GroupB/GroupC" (2 members)>

Data from d2:

833.7155

16.872583

120.88334

745.5382

209.70691

331.48315

923.9745

869.833

809.82605

138.74417

523.5276

99.866295

276.7765

137.3819

673.0125

218.77142

832.29254

283.78696

302.44574

148.8316


Copyright 2024 © pythontic.com