Example

We can start by creating two data packages on the test node to nest beneath a parent. These data packages contain measurements taken from Lake E1 in Alaska in 2013 and 2014.

First, load the Arctic Data Center Test Node and libraries.

library(dataone)
library(arcticdatautils)
library(EML)

cn_staging <- CNode('STAGING')
adc_test <- getMNode(cn_staging,'urn:node:mnTestARCTIC')

cn <- CNode('PROD')
adc <- getMNode(cn, 'urn:node:ARCTIC')

We will re-create the following parent package: https://arcticdata.io/catalog/#view/urn:uuid:799b7a86-cb1c-497c-a05a-d73492915cad on the test node with two of its children. First we will copy two of the children to the test node, make sure your token for the test node is not expired.

from <- dataone::D1Client("PROD", "urn:node:ARCTIC")
to <- dataone::D1Client("STAGING", "urn:node:mnTestARCTIC")

child_pkg_1 <- datamgmt::clone_package('resource_map_doi:10.18739/A2KS1R',
                                       from = from, to = to,
                                       add_access_to = arcticdatautils:::get_token_subject(),
                                       change_auth_node = TRUE, new_pid = TRUE)

child_pkg_2 <- datamgmt::clone_package('resource_map_doi:10.18739/A2QK29',
                                       from = from, to = to,
                                       add_access_to = arcticdatautils:::get_token_subject(),
                                       change_auth_node = TRUE, new_pid = TRUE)

These two packages correspond to data from the same study, varying only by year; however, they currently exist on the test node as independent entities. We will associate them with each other by nesting them underneath a parent.

Now, let’s create a parent metadata file. Read in one of the childs’ metadata files (EML). We can download object from a node in binary format using dataone::getObject(). Once it’s downloaded we just need to convert to it to the proper format: in this case to EML format using EML::read_eml().

doc_parent <- read_eml(getObject(adc_test, child_pkg_1$metadata))
## View the title 
doc_parent$dataset$title

The title of this child contains “2012-2013”. This is too specific for the parent, as the temporal range of both childs is 2012-2014. The parent should encompass this larger time range.

doc_parent$dataset$title <- 'Time series of water temperature, specific conductance, and oxygen from Lake E1, North Slope, Alaska, 2012-2014'

Like the title, the temporal coverage elements in this EML need to be adjusted.

new_end_date <- "2014-09-20"
doc_parent$dataset$coverage$temporalCoverage$rangeOfDates$endDate$calendarDate <- new_end_date

Remove dataTables and otherEntitys from the metadata. If you recall from previous chapters, dataTables contain metadata associated with data files (generally CSVs) and otherEntitys contain metadata about any other files in the data package (for instance a README or coding script). Because the parent does not contain any data objects, we want to remove dataTables and otherEntitys from the metdata file. In this instance, the E1 2013 metadata only contain dataTables. We can remove these by setting the dataTable element in the EML to a new blank object.

doc_parent$dataset$dataTable <- NULL

In this case, the abstract, contacts, creators, geographicDescription, and methods are already generalized and do not require changes.

Before writing your parent EML make sure that it validates. This is just a check to make sure everything is in the correct format.

eml_validate(doc_parent)

After your EML validates we need to save, or “write”, it as a new file. Write your parent EML to a directory in your home folder. You can view this process like using “Save as” in Microsoft Word. We opened a file (“E1_2013.xml”), made some changes, and “saved it as” a new file called “doc_parent.xml”.

# We can save the eml in a temporary file 
eml_path <- file.path(tempdir(), 'science_metadata.xml')
write_eml(doc_parent, path)

Next, we will publish the parent metadata to the test node.

metadata_parent <- publish_object(adc_test, 
                                  path = eml_path, 
                                  format_id = format_eml())

Finally, we create a resource map for the parent package. We nest the two child data packages using the child_pids argument in create_resource_map(). Note that these child_pids are PIDs for the resource maps of the child packages, NOT the metadata PIDs.

resource_map_parent <- create_resource_map(adc_test, 
                                           metadata_pid = metadata_parent,
                                           child_pids = c(child_pkg_1$resource_map,
                                                          child_pkg_2$resource_map))

The child packages are now nested underneath the parent.