Edit attributeLists

Attributes are descriptions of variables, typically columns or column names in tabular data. Attributes are stored in an attributeList. When editing attributes in R, we convert the attribute list information to data frame (table) format so that it is easier to edit. When editing attributes you will need to create one to three data frame objects:

  1. A data.frame of attributes
  2. A data.frame of custom units (if applicable)
  3. A data.frame of factors (if applicable)

The attributeList is an element within one of 4 different types of entity objects. An entity corresponds to a file, typically. Multiple entities (files) can exist within a dataset. The 4 different entity types are dataTable (most common for us), spatialVector, spatialRaster, and otherEntity.

Please note that submitting attribute information through the website will store them in an otherEntity object by default. We prefer to store them in a dataTable object for tabular data or a spatialVector or spatialRaster object for spatial data.

To edit or examine an existing attribute list already in an EML file, you can use the following commands, where i represents the index of the series element you are interested in. Note that if there is only one item in the series (ie there is only one dataTable), you should just call doc$dataset$dataTable, as in this case doc$dataset$dataTable[[1]] will return the first sub-element of the dataTable (the entityName)

# If they are stored in an otherEntity (submitted from the website by default)
attribute_tables <- EML::get_attributes(doc$dataset$otherEntity[[i]]$attributeList)
# Or if they are stored in a dataTable (usually created by a datateam member)
attribute_tables <- EML::get_attributes(doc$dataset$dataTable[[i]]$attributeList)

The get_attributes() function returns the attribute_tables object, which is a list of the three data frames mentioned above. The data frame with the attributes is called attribute_tables$attributes.

attribute_tables$attributes
print(attribute_tables$attributes)

Edit attributes

Attribute information should be stored in a data.frame with the following columns:

  • attributeName: The name of the attribute as listed in the csv. Required. e.g.: “c_temp”
  • attributeLabel: A descriptive label that can be used to display the name of an attribute. It is not constrained by system limitations on length or special characters. Optional. e.g.: “Temperature (Celsius)”
  • attributeDefinition: Longer description of the attribute, including the required context for interpreting the attributeName. Required. e.g.: “The near shore water temperature in the upper inter-tidal zone, measured in degrees Celsius.”
  • measurementScale: One of: nominal, ordinal, dateTime, ratio, interval. Required.
    • nominal: unordered categories or text. e.g.: (Male, Female) or (Yukon River, Kuskokwim River)
    • ordinal: ordered categories. e.g.: Low, Medium, High
    • dateTime: date or time values from the Gregorian calendar. e.g.: 01-01-2001
    • ratio: measurement scale with a meaningful zero point in nature. Ratios are proportional to the measured variable. e.g.: 0 Kelvin represents a complete absence of heat. 200 Kelvin is half as hot as 400 Kelvin. 1.2 meters per second is twice as fast as 0.6 meters per second.
    • interval: values from a scale with equidistant points, where the zero point is arbitrary. This is usually reserved for degrees Celsius or Fahrenheit, or latitude and longitude coordinates, or any other human-constructed scale. e.g.: there is still heat at 0° Celsius; 12° Celsius is NOT half as hot as 24° Celsius.
  • domain: One of: textDomain, enumeratedDomain, numericDomain, dateTime. Required.
    • textDomain: text that is free-form, or matches a pattern
    • enumeratedDomain: text that belongs to a defined list of codes and definitions. e.g.: CASC = Cascade Lake, HEAR = Heart Lake
    • dateTimeDomain: dateTime attributes
    • numericDomain: attributes that are numbers (either ratio or interval)
  • formatString: Required for dateTime, NA otherwise. Format string for dates, e.g. “DD/MM/YYYY”.
  • definition: Required for textDomain, NA otherwise. Defines a format for attributes that are a character string. e.g.: “Any text” or “7-digit alphanumeric code”
  • unit: Required for numericDomain, NA otherwise. Unit string. If the unit is not a standard unit, a warning will appear when you create the attribute list, saying that it has been forced into a custom unit. Use caution here to make sure the unit really needs to be a custom unit. A list of standard units can be found using: standardUnits <- EML::get_unitList() then running View(standardUnits$units).
  • numberType: Required for numericDomain, NA otherwise. Options are real, natural, whole, and integer.
    • real: positive and negative fractions and integers (…-1,-0.25,0,0.25,1…)
    • natural: non-zero positive integers (1,2,3…)
    • whole: positive integers and zero (0,1,2,3…)
    • integer: positive and negative integers and zero (…-2,-1,0,1,2…)
  • missingValueCode: Code for missing values (e.g.: ‘-999’, ‘NA’, ‘NaN’). NA otherwise. Note that an NA missing value code should be a string, ‘NA’, and numbers should also be strings, ‘-999.’
  • missingValueCodeExplanation: Explanation for missing values, NA if no missing value code exists.

You can create attributes manually by typing them out in R following a workflow similar to the one below:

attributes <- data.frame(
    
    attributeName = c('Date', 'Location', 'Region','Sample_No', 'Sample_vol', 'Salinity', 'Temperature', 'sampling_comments'),
    attributeDefinition = c('Date sample was taken on', 'Location code representing location where sample was taken','Region where sample was taken', 'Sample number', 'Sample volume', 'Salinity of sample in PSU', 'Temperature of sample', 'comments about sampling process'),
    measurementScale = c('dateTime', 'nominal','nominal', 'nominal', 'ratio', 'ratio', 'interval', 'nominal'),
    domain = c('dateTimeDomain', 'enumeratedDomain','enumeratedDomain', 'textDomain', 'numericDomain', 'numericDomain', 'numericDomain', 'textDomain'),
    formatString = c('MM-DD-YYYY', NA,NA,NA,NA,NA,NA,NA),
    definition = c(NA,NA,NA,'Six-digit code', NA, NA, NA, 'Any text'),
    unit = c(NA, NA, NA, NA,'milliliter', 'dimensionless', 'celsius', NA),
    numberType = c(NA, NA, NA,NA, 'real', 'real', 'real', NA),
    missingValueCode = c(NA, NA, NA,NA, NA, NA, NA, 'NA'),
    missingValueCodeExplanation = c(NA, NA, NA,NA, NA, NA, NA, 'no sampling comments'))

However, typing this out in R can be a major pain. Luckily, there’s a Shiny app that you can use to build attribute information. You can use the app to build attributes from a data file loaded into R (recommended as the app will auto-fill some fields for you) to edit an existing attribute table, or to create attributes from scratch. Use the following commands to create or modify attributes.

Use the following commands to create or modify attributes. These commands will launch a “Shiny” app in your web browser.

#first download the CSV in your data package from Exercise #2
data_pid <- selectMember(dp, name = "sysmeta@fileName", value = ".csv")
data <- read.csv(text=rawToChar(getObject(d1c_test@mn, data_pid)))
# From data (recommended)
attribute_tables <- EML::shiny_attributes(data = data)

# From scratch
attribute_tables <- EML::shiny_attributes()

# From an existing attribute list
attribute_tables <- get_attributes(doc$dataset$dataTable[[i]]$attributeList)
attribute_tables <- EML::shiny_attributes(attributes = attribute_tables$attributes)

Once you are done editing a table in the browser app, quit the app by pressing the red “Quit App” button in the top right corner of the page.

If you close the Shiny app tab in your browser instead of using the “Quit App” button, your work will not be saved, R will think that the Shiny app is still open, and you will not be able to run other code. You can tell if R is confused if you have closed the Shiny app and the bottom line in the console still says Listening on http://.... If this happens, press the red stop sign button on the right hand side of the console window in order to interrupt R.

The tables you constructed in the app will be assigned to the attribute_tables variable as a list of data frames (one for attributes, factors, and units). Be careful to not overwrite your completed attribute_tables object when trying to make edits. The last line of code can be used in order to make edits to an existing attribute_tables object.

Alternatively, each table can be to exported to a csv file by clicking the Download button. If you downloaded the table, read the table back into your R session and assign it to a variable in your script (e.g. attributes <- data.frame(...)), or just use the variable that shiny_attributes returned.

For simple attribute corrections, datamgmt::edit_attribute() allows you to edit the slots of a single attribute within an attribute list. To use this function, pass an attribute through datamgmt::edit_attribute() and fill out the parameters you wish to edit/update. An example is provided below where we are changing attributeName, domain, and measurementScale in the first attribute of a dataset. After completing the edits, insert the new version of the attribute back into the EML document.

new_attribute <- datamgmt::edit_attribute(doc$dataset$dataTable[[1]]$attributeList$attribute[[1]], attributeName = 'date_and_time', domain = 'dateTimeDomain', measurementScale = 'dateTime')
doc$dataset$dataTable[[1]]$attributeList$attribute[[1]] <- new_attribute

Edit custom units

EML has a set list of units that can be added to an EML file. These can be seen by using the following code:

standardUnits <- EML::get_unitList()
View(standardUnits$units)

Search the units list for your unit before attempting to create a custom unit. You can search part of the unit you can look up part of the unit ie meters in the table to see if there are any matches.

If you have units that are not in the standard EML unit list, you will need to build a custom unit list. Attribute tables with custom units listed will generate a warning indicating that a custom unit will need to be described.

A unit typically consists of the following fields:

  • id: The unit id (ids are camelCased)
  • unitType: The unitType (run View(standardUnits$unitTypes) to see standard unitTypes)
  • parentSI: The parentSI unit (e.g. for kilometer parentSI = “meter”). The parentSI does not need to be part of the unitList.
  • multiplierToSI: Multiplier to the parentSI unit (e.g. for kilometer multiplierToSI = 1000)
  • name: Unit abbreviation (e.g. for kilometer name = “km”)
  • description: Text defining the unit (e.g. for kilometer description = “1000 meters”)

To manually generate the custom units list, create a dataframe with the fields mentioned above. An example is provided below that can be used as a template:

custom_units <- data.frame(
    
  id = c('siemensPerMeter', 'decibar'),
  unitType = c('resistivity', 'pressure'),
  parentSI = c('ohmMeter', 'pascal'),
  multiplierToSI = c('1','10000'),
  abbreviation = c('S/m','decibar'),
  description = c('siemens per meter', 'decibar'))

Custom units can also be created in the shiny app, under the “units” tab. They cannot be edited again in the shiny app once created.

attribute_tables <- EML::shiny_attributes()

custom_units <- attribute_tables$units

Using EML::get_unit_id for custom units will also generate valid EML unit ids.

Custom units are then added to additionalMetadata using the following command:

unitlist <- set_unitList(custom_units, as_metadata = TRUE)
doc$additionalMetadata <-  list(metadata = list(unitList = unitlist))

Edit factors

For attributes that are enumeratedDomains, a table is needed with three columns: attributeName, code, and definition.

  • attributeName should be the same as the attributeName within the attribute table and repeated for all codes belonging to a common attribute.
  • code should contain all unique values of the given attributeName that exist within the actual data.
  • definition should contain a plain text definition that describes each code.

To build factors by hand, you use the named character vectors and then convert them to a data.frame as shown in the example below. In this example, there are two enumerated domains in the attribute list - “Location” and “Region”.

Location <- c(CASC = 'Cascade Lake', CHIK = 'Chikumunik Lake', HEAR = 'Heart Lake', NISH = 'Nishlik Lake' )
Region <- c(W_MTN = 'West region, locations West of Eagle Mountain', E_MTN = 'East region, locations East of Eagle Mountain')

The definitions are then written into a data.frame using the names of the named character vectors and their definitions.

factors <- rbind(data.frame(attributeName = 'Location', code = names(Location), definition = unname(Location)),
                  data.frame(attributeName = 'Region', code = names(Region), definition = unname(Region)))

Factors can also be created in the shiny app, under the “factors” tab. They cannot be edited again in the shiny app once created.

attribute_tables <- EML::shiny_attributes()

attribute_tables$factors

Finalize attributeList

Once you have built your attributes, factors, and custom units, you can add them to EML objects. Attributes and factors are combined to form an attributeList using set_attributes():

# Create an attributeList object
attributeList <- EML::set_attributes(attributes = attribute_tables$attributes,
                                     factors = attribute_tables$factors) 

This attributeList object can then be checked for errors and added to a dataTable in the EML document.

# Edit EML document with object
doc$dataset$dataTable[[i]]$attributeList <- attributeList