Chapter 3 Exploring EML

We use the Ecological Metadata Language (EML) to store structured metadata for all datasets submitted to the Arctic Data Center. EML is written in XML (extensible markup language) and functions for building and editing EML are in the EML R package.

Currently the Arctic Data Center website supports editing EML version 2.2.0. There are still some metadata in 2.1.1 that will be converted eventually.

For additional background on EML and principles for metadata creation, check out this paper.

If you aren’t too familiar with lists and how to navigate them yet take a look at the relevant sections in the Stat 545 class.

3.2 Understand the EML schema

Another great resource for navigating the EML structure is looking at the schema which defines the structure. The schema diagrams on this page are interactive. Further explanations of the symbology can be found here. The schema is complicated and may take some time to get familiar with before you will be able to fully understand it. Use your browser’s “search in page” function (usually CTRL-F or Command-F) to navigate the EML schema page quickly.

For example, let’s take a look at eml-party. To start off, notice that some elements have bolded lines leading to them.

A bold line indicates that the element is required if the element above it (to the left in the schema) is used, otherwise the element is optional.

Notice also that next to the givenName element it says “0..infinity”. This means that the element is unbounded — a single party can have many given names and there is no limit on how many you can add. However, this text does not appear for the surName element — a party can have only one surname.

You will also see icons linking the EML slots together, which indicate the ordering of subsequent slots. These can indicate either a “sequence” or a “choice”. In our example from eml-party, a “choice” icon indicates that either an individualName, organizationName, or positionName is required, but you do not need all three. However, the “sequence” icon tells us that if you use an individualName, you must include the surName as a child element. If you include the optional child elements salutation and givenName, they must be written in the order presented in the schema.

The eml schema sections you may find particularly helpful include eml-party, eml-attribute and eml-physical.

For a more detailed description of the EML schema, see the reference section on exploring EML.

3.2.1 Check your understanding

  • Find otherEntity within the EML schema. Which elements are required? Can otherEntity be a series object?
Answer
otherEntity requires entityType and entityName children, or alternatively will accept only references. It is a series object, so there can be multiple otherEntities. Along with otherEntity and creator, dataTable and attribute can also be series objects.

3.3 Access specific elements

The eml_get() function is a powerful tool for exploring EML (more on that here ). It takes any chunk of EML and returns all instances of the element you specify. Note: you’ll have to specify the element of interest exactly, according to the spelling/capitalization conventions used in EML. Here are some examples:

doc <- read_eml(system.file("example-eml.xml", package = "arcticdatautils"))
eml_get(doc, "creator")
individualName:
  givenName: Bryce
  surName: Mecum
organizationName: National Center for Ecological Analysis and Synthesis
eml_get(doc, "boundingCoordinates")
eastBoundingCoordinate: '-134'
northBoundingCoordinate: '59'
southBoundingCoordinate: '57'
westBoundingCoordinate: '-135'
eml_get(doc, "url")
'':
  function: download
  url: ecogrid://knb/urn:uuid:89bec5d0-26db-48ac-ae54-e1b4c999c456
'': ecogrid://knb/urn:uuid:89bec5d0-26db-48ac-ae54-e1b4c999c456

eml_get_simple() is a simplified alternative to eml_get() that produces a list of the desired EML element.

eml_get_simple(doc$dataset$otherEntity, "entityName")

To find an eml element you can use either a combination of which_in_emlfrom the arcticdatautils package or eml_get_simple and which to find the index in an EML list. Use which ever workflow you see fit.

An example question you may have: Which creators have a surName “Mecum”?

Example using which_in_eml:

n <- which_in_eml(doc$dataset$creator, "surName", "Mecum")
# Answer: doc$dataset$creator[[n]]

Example using eml_get_simple and which:

ent_names <- eml_get_simple(doc$dataset$creator, "surName")
i <- which(ent_names == "Mecum")
# Answer: doc$dataset$creator[[i]]