Final Checklist

You can click on the assessment report on the website to for a general check. Fix anything you see there.

Send the link over slack for peer review by your fellow datateam members. Usually we look for the following (the list is not exhaustive):

Special Datasets

Please refer to the dedicated pages for instructions to handle these cases:

System Metadata

the format ids are correct

General EML

  • Ethical Research Practice section is completed
  • Publisher and system information have been added:
doc <- eml_add_publisher(doc)
doc <- eml_add_entity_system(doc)

Title

  • Include geographic and temporal coverage (when and where data was collected)
  • Abbreviations are removed or defined

Abstract

  • longer than 100 words
  • Abbreviations are removed or defined. No garbled text
  • Tags such as <superscript>2</superscript> and <subscript>2</subscript> can be used for nicer formatting

DataTable / OtherEntity / SpatialVectors

  • In the correct one: DataTable / OtherEntity / SpatialVector / SpatialRaster for the file type
  • entityDescription - longer than 5 words and unique
  • physical present and up-to-date and format is correct

Attribute Table

  • Complete
  • attributeDefinitions longer than 3 words
  • Variables match what is in the file
  • Measurement domain - if appropirate (ie dateTime correct)
  • Missing Value Code - accounted for if applicable
  • Semantic Annotation - appropriate semantic annotations added, especially for spatial and temporal variables: lat, lon, date etc.
  • Custom units were created if necessary

People

  • Complete information for each person in each section
    • includes ORCID and e-mail address for all contacts
    • people repeated across sections have consistent information

Geographic region

  • the map looks correct and matches the geographic description
  • check if negatives (-) are missing

Project

  • If it is an NSF award you can use the helper function:
    • doc$dataset$project <- eml_nsf_to_project(awards)
  • for other awards that need to be set manually, see the set project page

Methods

  • Present
  • No garbled text
  • Acronyms and units are defined the first time they are used

Check EML Version

  • Currently using: eml-2.2.0 (as of July 30 2020)
  • Review to see if the EML version is set correctly by reviewing the doc$`@context` that it is indeed 2.2.0 under eml
  • Re-run your code again and have the lineemld::eml_version("eml-2.2.0") at the top
  • Make sure the system metadata is also 2.2.0

Access

  • If necessary, granted access to PI using set_rights_and_access()
    • make sure it is http:// (no s)
  • note if it is a part of portals there might be specific access requirements for it to be visible using set_access()

SFTP Files

  • If there are files transferred to us via SFTP, delete those files once they have been added to the dataset and when the ticket is resolved

Updated datasets

All the above applies. These are some areas to do a closer check when users update with a new file:

  • New data was added
    • Temporal Coverage and Title
    • Confirm if new data used same methods or if methods need updating
  • Files were replaced
    • update physical and entityName
    • double-check attributes are the same
    • check for any new missing value codes that should be accounted for
  • Was the dataset published before 2021?
    • update project info, annotations
  • Glance over entire page for any small mistakes (ie. repeated additionalMetadata, any missed &amps, typos)