- HDF5 is a specification and format for creating hierarchical data from very large data sources.
- In HDF5 the data is organized in a file. The file object acts as the / (root) group of the hierarchy. Similar to the UNIX file system, in HDF5 the datasets and their groups are organized as an inverted tree.
- Several groups can be created under the / (root) group. A group can have one or more datasets and other groups (Self referencing links are possible as well.)
- A group at any level below the / (root) group in a HDF5 file is created using the method create_group() on a h5py File instance;
- The members in the group can be accessed using
- Dictionary notation – parentobject[“childname”], parentobject [“level1/leveln/childname”]
- parentobject.get(“childname”)
- Datasets inside the groups can be created using
- parentobject[“datasetname”]
- parentobject.create_dataset()
- A group’s parent is given by the attribute, parent of the group.
# Example Python program that creates a hierarchy of groups # and datasets in a HDF5 file using h5py import h5py import random import numpy.random
# Create a HDF5 file hierarchicalFileName = "Hierarchical.hdf5"; hierarchicalFile = h5py.File(hierarchicalFileName, "w");
# Create a group under root grp1 = hierarchicalFile.create_group("Group1"); grp2 = grp1.create_group("Group2"); grp3 = grp2.create_group("Group3");
# Use POSIX path to create a hierarchy of group under root grp4 = hierarchicalFile.create_group("/GroupA/GroupB/GroupC");
# Use dictionary notation to create a dataset inside a group grp4["D1"] = 1;
# Create another dataset inside the same group datasetShape = (10,2); d2 = grp4.create_dataset("D2", datasetShape);
# Print the groups print(hierarchicalFile["/"]); print(grp1); print(grp2); print(grp3); print(grp4); print(grp4["D1"]); print(grp4["D1"].parent); print(grp4["D2"]); print(grp4["D2"].parent);
# Add value to the D2 dataset for a in range(0, d2.shape[0]): for b in range(0, d2.shape[1]): d2[a, b] = numpy.random.uniform(1, 1000, 1)[0];
print("Data from d2:"); # Read value from the dataset for x in range(0, 10): for y in range(0, 2): print(d2[x, y]); |
<HDF5 group "/" (2 members)> <HDF5 group "/Group1" (1 members)> <HDF5 group "/Group1/Group2" (1 members)> <HDF5 group "/Group1/Group2/Group3" (0 members)> <HDF5 group "/GroupA/GroupB/GroupC" (2 members)> <HDF5 dataset "D1": shape (), type "<i8"> <HDF5 group "/GroupA/GroupB/GroupC" (2 members)> <HDF5 dataset "D2": shape (10, 2), type "<f4"> <HDF5 group "/GroupA/GroupB/GroupC" (2 members)> Data from d2: 833.7155 16.872583 120.88334 745.5382 209.70691 331.48315 923.9745 869.833 809.82605 138.74417 523.5276 99.866295 276.7765 137.3819 673.0125 218.77142 832.29254 283.78696 302.44574 148.8316 |