We use the Ecological Metadata Language (EML) to store structured metadata for all datasets submitted to the Arctic Data Center. EML is written in XML (extensible markup language) and functions for building and editing EML are in the EML R package.
Currently the Arctic Data Center website supports editing EML version 2.2.0. There are still some metadata in 2.1.1 that will be converted eventually.
For additional background on EML and principles for metadata creation, check out this paper.
If you aren’t too familiar with lists and how to navigate them yet take a look at the relevant sections in the Stat 545 class.
Another great resource for navigating the EML structure is looking at the schema which defines the structure. The schema diagrams on this page are interactive. Further explanations of the symbology can be found here. The schema is complicated and may take some time to get familiar with before you will be able to fully understand it.
For example, let’s take a look at eml-party. To start off, notice that some elements have bolded lines leading to them.
A bold line indicates that the element is required if the element above it (to the left in the schema) is used, otherwise the element is optional.
Notice also that next to the
givenName element it says “0..infinity”. This means that the element is unbounded — a single party can have many given names and there is no limit on how many you can add. However, this text does not appear for the
surName element — a party can have only one surname.
You will also see icons linking the EML slots together, which indicate the ordering of subsequent slots. These can indicate either a “sequence” or a “choice”. In our example from
eml-party, a “choice” icon indicates that either an
positionName is required, but you do not need all three. However, the “sequence” icon tells us that if you use an
individualName, you must include the
surName as a child element. If you include the optional child elements
givenName, they must be written in the order presented in the schema.
For a more detailed description of the EML schema, see the reference section on exploring EML.
eml_get() function is a powerful tool for exploring EML (more on that here ). It takes any chunk of EML and returns all instances of the element you specify. Note: you’ll have to specify the element of interest exactly, according to the spelling/capitalization conventions used in EML. Here are some examples:
<- read_eml(system.file("example-eml.xml", package = "arcticdatautils")) doc eml_get(doc, "creator")
individualName: givenName: Bryce surName: Mecum organizationName: National Center for Ecological Analysis and Synthesis
eastBoundingCoordinate: '-134' northBoundingCoordinate: '59' southBoundingCoordinate: '57' westBoundingCoordinate: '-135'
'': function: download url: ecogrid://knb/urn:uuid:89bec5d0-26db-48ac-ae54-e1b4c999c456 '': ecogrid://knb/urn:uuid:89bec5d0-26db-48ac-ae54-e1b4c999c456
eml_get_simple() is a simplified alternative to
eml_get() that produces a list of the desired EML element.
To find an eml element you can use either a combination of
arcticdatautils package or
which to find the index in an EML list. Use which ever workflow you see
An example question you may have: Which creators have a surName “Mecum”?
<- which_in_eml(doc$dataset$creator, "surName", "Mecum") n # Answer: doc$dataset$creator[[n]]
<- eml_get_simple(doc$dataset$creator, "surName") ent_names <- which(ent_names == "Mecum") i # Answer: doc$dataset$creator[[i]]