• Data Team Reference Guide
  • About
  • Miscellaneous
    • Code Snippets
    • File paths
      • Remote files vs. local files
      • File path shortcuts
    • Metacat version
    • Resources for R
      • Learning
      • Packages
  • Edit data packages
    • datapack Background
      • Navigating data packages
      • Data Objects
      • Provenance
    • Create a new data package
      • Upload new data files
      • Upload the package
    • Update packages with datapack
      • Get the package
      • Update the metadata
      • Modify the data files
      • Publish update
    • Add a pre-generated identifier to the EML
    • Edit sysmeta
      • Identifiers and sysmeta
      • Additional resources
    • Obsolescence chain
    • Set DataONE nodes
      • Staging (Test) nodes
      • Production nodes
    • Set rights and access
      • My profile
    • Show indexing status
    • Preserve folder structures
      • Download the zip file to datateam
      • Re-upload the contents to the Arctic Data Center
      • Summary
      • Example with multiple ZIP files
    • Update a package with a new data object
      • Publish update
    • Update an object
  • Explore EML
    • Access specific elements
    • Navigate through EML
    • Understand the EML schema
  • Edit EML
    • Edit an EML element
      • Edit EML with strings
      • Edit EML with the “EML” package
      • Edit EML with objects
    • Edit attributeLists
      • Edit attributes
      • Edit custom units
      • Edit factors
      • Finalize attributeList
    • Edit custom units
    • Edit dataTables
    • Edit otherEntities
      • Remove otherEntities
      • Create otherEntities
    • Semantic annotations
      • How annotations are used
      • How to add an annotation
      • Dataset Annotations
    • Edit spatial data
      • Reading Spatial Files
      • spatialVector
      • spatialRasters
    • Format text in EML
      • Type-setting
      • Links
      • Lists
    • Series Identifier (SID)
      • Adding a SID to a package
      • Updating the child packages of a parent package with a SID
      • Updating the metadata of a package with a SID
    • Set coverages
      • Set multiple coverages
      • Special coverages
    • Set methods
      • Adding sampling info to methods section
    • Set parties
    • Set physical
    • Set the project section
    • Use references
      • Example with parties
      • Example with attributes
  • PI correspondence
    • Email templates
      • Initial email template
      • Comment templates based on what is missing
      • Final email templates
      • Additional email templates
    • PI FAQ email templates
      • Data
      • Access
      • Scope
    • Good Examples of data packages
    • Final Checklist
      • Special Datasets
      • System Metadata
      • General EML
      • Title
      • Abstract
      • DataTable / OtherEntity / SpatialVectors
      • People
      • Geographic region
      • Project
      • Methods
      • Check EML Version
      • Access
      • SFTP Files
      • Updated datasets
    • Initial review checklist
    • Large file transfer
      • sFTP
      • Globus Endpoint
    • Navigate RT
      • Front page
      • All tickets
      • Example ticket
      • New data submission
    • Replicate Datasets
      • Cloning the packages
  • Miscellaneous file types
    • Convert Excel file to CSV
    • Datalogger files
    • NetCDFs
    • Reorganize files on the server
      • Basics
      • Rename file based on file path
      • Unzip files
      • Zip shapefiles
    • Scan rar files
    • Spatial data
  • Wrangle data
    • Clean column names
    • Fix Excel dates
  • Solr queries
    • Construct a query
    • Query Solr via a browser
    • Query Solr via R
    • Use facets
    • Use stats
    • Example Solr queries
      • Find everything
      • Query a wildcard expression
      • Query multiple fields
      • Query by formatType
      • Query pids by a specific submitter
      • Query pids with special characters
      • Query multiple conditions within one field
      • Query for latest versions only
      • Use NOT in a query
      • Query a coordinating node
      • Query for EMLs that document a specific data pid
      • Query for files uploaded during a specific time
      • Use facets
    • More resources
  • ADC web submissions
    • Add physicals to submissions
    • Assess attributes
    • List submissions to the Arctic Data Center
    • Recover failed submissions
  • Data Portals
    • Customizing Data Portals
      • My portals
      • Creating Data Filters
      • Set Access Permissions
      • Advanced portal customizations
      • Updating portals
    • Distributed Biological Observatory (DBO) submissions
    • MOSAiC
      • Dataset Level Annotations
      • Portal updates
  • Using arcticdatautils
    • Create a resource map using arcticdatautils
    • Get package and EML
    • Publish an object
    • Reorder entities
  • Nesting Data
    • Nesting a data package
    • Add children to an existing parent
    • Create a new parent package
    • Example
  • Large Dataset Method
    • Example Dataset
    • Uploading files to datateam
      • Instructions for uploading files to datateam
      • Moving files within datateam
    • Processing
      • Understanding the contents of a large data submission
      • Creating representative entities
      • Moving files to web-accessible location
      • Adding Markdown link to abstract in EML
  • Training

NCEAS Data Team Reference Guide

Navigate through EML

The first task when editing an EML file is navigating the EML file. An EML file is organized in a structure that contains many lists nested within other lists. The function View allows you to get a crude view of an EML file in the viewer. It can be useful for exploring the file.

# Need to be in this member node to explore file
d1c_test <- dataone::D1Client("STAGING", "urn:node:mnTestARCTIC")

doc <- read_eml(getObject(d1c_test@mn, "urn:uuid:558eabf1-1e91-4881-8ba3-ef8684d8f6a1"))
View(doc)

The complex EML document is represented in R as as series of named, nested lists. We use lists all the time in R! A data.frame is one example of a special kind of list that we use all the time. You may be familiar with the syntax dataframe$column_name which allows us to select a particular column of a data.frame. Under the hood, a data.frame is a named list of vectors with the same length. You select one of those vectors using the $ operator, which is called the “list selector operator.”

Just like you navigate in a data.frame, you can use the $ operator to navigate through the EML structure. The $ operator allows you to go deeper into the EML structure and to see what elements are nested within other elements. However, you have to tell R where you want to go in the structure when you use the $ symbol. For example, if you want to view the dataset element of your EML you would use the command doc$dataset. If you want to view the creators of your data set you would use doc$dataset$creator. Note here that creator is contained within dataset. If you aren’t sure where you want to go, hit the tab button on your keyboard after typing $ and a list of available elements in the structure will appear (e.g., doc$<TAB>):

Note that if you hit tab, and nothing pops up, this most likely implies that you are trying to go into an EML element that can take a series items. For example doc$dataset$creator$<TAB> will not show a pop-up menu. This is because creator is a series-type object (i.e. you can have multiple creators). If you want to go deeper into creator, you first must tell R which creator you are interested in. Do this by writing [[i]] first where i is the index of the creator you are concerned with. For example, if you want to look at the first creator i = 1. Now doc$dataset$creator[[1]]$<TAB> will give you many more options. Note, an empty autocomplete result sometimes means you have reached the end of a branch in the EML structure.

Below is the structure of doc$dataset. There are a series of multiple creators, which can be accessed individually by index: doc$dataset$creator[[#]].

At this point stop and take a deep breath. The key takeaway is that EML is a hierarchical tree structure. The best way to get familiar with it is to explore the structure. Try entering doc$dataset into your console, and print it. Now make the search more specific, for instance: doc$dataset$abstract.