This chapter will teach you how to edit and update an existing data package in R. Earlier, we updated metadata. In this section we will learn how to update a data file, and how to update a package by adding an additional data file.
To update a data file associated with a data package, you need to do three things:
- update the object itself,
- update the resource map (which affiliates the object with the metadata), and
- update the metadata that describes that object
datapack::replaceMember function takes care of the first two of these tasks. First you need to get the pid of the file you want to replace by using
<- selectMember(dp, name="sysmeta@formatId", value="https://eml.ecoinformatics.org/eml-2.2.0")metadataId
<- replaceMember(dp, metadataId, replacement=file_path)dp
If you want to remove some files from the data package we can use
datapack::removeMember. If we wanted to remove all the zip files associated with this data package, we can use
<- selectMember(dp, name="sysmeta@formatId", value="application/vnd.shp+zip") zipId removeMember(dp, zipId, removeRelationships = T)
You will need to be explicit about your
based on the file type. A list of format IDs can be found
on the DataONE website. Use line 2 (Id:) exactly, character for
To accomplish the second task, you will need to update the metadata using the EML package. This is covered in Chapter 4. After you update a file, you will always need to update the metadata because parts of the
physical section (such as the file size, checksum) will be different, and it may also require different attribute information.
Once you have updated your metadata and saved it, you can update the package itself.
Once you have updated the data objects and saved the metadata to a file, we can update the metadata and use
replaceMember to update the package with the new metadata.
Make sure you have the package you want to update loaded into R using
Now we can update your data package to include the new data object. Assuming you have updated your data package earlier something like the below:
<- dataone::D1Client("STAGING", "urn:node:mnTestARCTIC") d1c_test <- "the resource map" packageId <- getDataPackage(d1c_test, identifier=packageId, lazyLoad=TRUE, quiet=FALSE) dp <- selectMember(dp, name="sysmeta@formatId", value="https://eml.ecoinformatics.org/eml-2.2.0") metadataId #some modification to the EML here <- "path/to/your/saved/eml.xml" eml_path write_eml(doc, eml_path) <- replaceMember(dp, metadataId, replacement=eml_path)dp
You can then upload your data package:
<- data.frame(subject="CN=arctic-data-admins,DC=dataone,DC=org", permission="changePermission") myAccessRules <- uploadDataPackage(d1c_test, dp, public=FALSE, accessRules=myAccessRules, quiet=FALSE)packageId
If a package is ready to be public, you can change the
public argument in the
datapack::uploadDataPackage() call to
If you want to publish with a DOI (Digital Object Identifier) instead of a UUID (Universally Unique Identifier), you need to do this when replacing the metadata. This should only be done after the package is finalized and has been thoroughly reviewed!
<- dataone::generateIdentifier(d1c_test@mn, "DOI") doi <- replaceMember(dp, metadataId, replacement=eml_path, newId=doi) dp <- uploadDataPackage(d1c_test, dp, public=TRUE, quiet=FALSE)newPackageId
If there is a pre-issued DOI (researcher requested the DOI for the publication first), please do the following:
<- replaceMember(dp, metadataId, replacement=eml_path, newId="your pre-issued doi previously generated") dp <- uploadDataPackage(d1c_test, dp, public=TRUE, quiet=FALSE)newPackageId
If the package has children, see how to do this using
arcticdatautils::publish_update in the nesting section of the reference manual.
Refresh the landing page at test.arcticdata.io/#view/… for this package and then follow the “newer version” link to view the latest.
What if the researcher notices that some information needed to be updated in the data file? We can use
replaceMember to do just that!
If you haven’t already:
- Locate the data package you published in the previous exercise by navigating to the URL test.arcticdata.io/#view/…
- Load the package and EML into R using the above commands.
Make a slightly different data file to upload to
test.arcticdata.io for this exercise:
- Load the data file associated with the package into R as a
data.frame. (Hint: use
read.csv()to upload the data file from your computer/the server.)
- Make an edit to the data in R (e.g. change one of the colnames to
- Save the edited data. (Hint: use
write.csv(data, row.names = FALSE).)
Upload the new csv file, get a new pid and publish those updates:
- Update the data file in the package with the edited data file using
- Update your package using