datapack Background

adapted from the dataone and datapack vignettes

datapack is written differently than most R packages you may have encountered in the past. This is because it uses the S4 system instead.

library(dataone)
library(datapack)
library(uuid)

Data packages

Data packages are a class that has slots for relations (provenance), objects(the metadata and data file(s)) and systemMetadata.

Data Objects

You can see what slots are in an S4 object after typing the subsetting operator @, or pressing TAB with the cursor after an existing @. Try viewing the slots of the data package by pressing TAB after writing the following:

dp@

Check out the objects slot

dp@objects

The objects slot contains a list of object PIDs that are accessed using the $ subsetting operator. Both are found within the structure of data packages in R.

Get the number of data and metadata files associated with this data package with datapack’s getSize function.

getSize(dp)

Get the file names and corresponding PIDs using datapack’s getValue function. You can also get other sysmeta slots such as formatId and size by changing the name argument.

getValue(dp, name="sysmeta@fileName")

Get identifiers

You can search by any of the sysmeta slots such as fileName and formatId and get the corresponding identifier(s):

metadataId <- selectMember(dp, name="sysmeta@ADD THE NAME OF THE SLOT", value="PATTERN TO SEARCH BY")

Example:

selectMember(dp, name="sysmeta@formatId", value="image/tiff") 
selectMember(dp, name="sysmeta@fileName", value="filename.csv")

These will give you the PIDs of the files that are of the formatId “image/tiff” and the file name “filename.csv”, respectively.

Provenance

View the provenance as a dataTable. This explains the relationships between different files. We will get into detail in the Building provenance chapter.

dp@relations$relations