Purpose:
Brief overview of topic and relationship to scientific synthesis.
At the end of this 30 minute overview you will be able to: 1. Explain the general concept of a hierarchical data structure 2. List 2-3 features of a H5 file (compressed data, embedded metadata, can store heterogeneous formats)
There are several different hierarchical data formats out there. Today we will discuss 3 of the main ones:
All hierarchical data structures have shared features including
Hierarchical data structures allow us to support different types of data in one single file. You can think of them as a directory of objects stored in a single file.
The most flexible and newest structure is H5 / hdf5 which is actually the base structure for many other formats that you may have used including
The HDF5 structure supports many features including
hdf4 is one of the early hierarchical data structures. One common use of H4 is seen in the MODIS satellite data. There is a known structure associated with these data and the data can be opened in R using the gdalutils
package.
The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data. HDF5 uses a “file directory” like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer. The HDF5 format also allows for embedding of metadata making it self-describing.
netCDF is actually an HDF5 file under the hood! netcdf was developed by the UniData group at NCAR in Boulder, Colorado with a focus on storing large spatio temporal climate data.
As wonderful as H5 is, one of the great challenges with a data structure that is so flexible is that there aren’t currently discrete standards to store things like spatial data.
Thus you can’t simply open an H5 file in a tool like arcGIS or QGIS. Rather, the data require some processing to spatially locate it - assuming that it contain all of the correct metadata to spatially locate it!
On the other hand, we can open netCDF files that store climate data given existing and widely used standards associated with how the data are stored. For instance, we can import a netcdf file into R using the raster() function!
In the next portion of this lesson, we will explore together the Hdf5 structure using the free, java based H5 viewer with the goal of better understanding some of the characteristics of a Hierarchical data structure.