Email templates

This section covers new data packages submitted. For other inquiries see the PI FAQ templates

Please think critically when using these canned replies rather than just blindly sending them. Typically, content should be adjusted/ customized for each response to be as relevant, complete, and precise as possible.

In your first few months, please run email drafts by the #datateam Slack and get approval before sending.

Remember to consult the submission guidelines for details of what is expected.

Quick reference:

Initial email template

Hello [NAME OF REQUESTOR], Thank you for your recent submission to the NSF Arctic Data Center!

From my preliminary examination of your submission I have noticed a few items that I would like to bring to your attention. We are here to help you publish your submission, but your continued assistance is needed to do so. See comments below:

[COMMENTS HERE]

After we receive your responses, we can make the edits on your behalf, or you are welcome to make them yourself using our user interface.

Best,

[YOUR NAME]

Comment templates based on what is missing

Portals

Multiple datasets under the same project - suggest data portal feature

I would like to highlight the Data Portals feature that would enhance the ability of your datasets to be discovered together. It also enables some custom branding for the group or project. Here is an example page that we created for the Distributed Biological Observatory: https://arcticdata.io/catalog/portals/DBO. We highly suggest considering creating a portal for your datasets, you can get one started here: https://arcticdata.io/catalog/edit/portals/new/Settings. More information on how to set up can be found here: https://arcticdata.io/data-portals/. Data portals can be set up at any point of your project.

If they ask to nest the dataset

We are no longer supporting new nested datasets. We recommend to create a data portal instead. Portals will allow more control and has the same functionality as a nested dataset. You can get one started here: https://arcticdata.io/catalog/edit/portals/new/Settings. More information on how to set up can be found here: https://arcticdata.io/data-portals/, but as always we are here to help over email.

Dataset citations

If there is a publication associated with this dataset, we would appreciate it if you could register the DOI of your published paper with us by using the Citations button right below the title at the dataset landing page. We are working to build our catalog of dataset citations in the Arctic Data Center. This can be done at any point.

Title

Provides the what, where, and when of the data

We would like to add some more context to your data package title. I would like to suggest: ‘OUR SUGGESTION HERE, WHERE, WHEN’.

Does not use acronyms

We wanted to clarify a couple of abbreviations. Could you help us in defining some of these: [LIST OF ACRONYMS TO DEFINE HERE]

Abstract

Describes DATA in package (ideally > 100 words)

We would like to add some additional context to your abstract. We hope to add the following: [ADJUST THE DEPENDING ON WHAT IS MISSING]

  • The motivation of the study
  • Where and when the research took place
  • At least one sentence summarizing general methodologies
  • All acronyms are defined
  • At least 100 words long

Offer this if submitter is reluctant to change:

If you prefer and it is appropriate, we could add language from the abstract in the NSF Award found here: [NSF AWARD URL].

Keywords

We noticed that there were no keywords for this dataset. Adding keywords will help your dataset be discovered by others.

Data

Sensitive Data

We will need to ask these questions manually until the fields are added to the webform.

  1. Data sensitivity categories

Once we have the ontology this question can be asked:

Based on our Data sensitivity categories, which of the 3 does your dataset align with most:

  • Non-sensitive data - None of the data includes sensitive or protected information.

  • Some or all data is sensitive with minimal risk - Sensitive data has been de-identified, anonymized, aggregated, or summarized to remove sensitivities and enable safe data distribution. Examples include ensuring that human subjects data, protected species data, archaeological site locations and personally identifiable information have been properly anonymized, aggregated and summarized.

  • Some or all data is sensitive with significant risk - The data contains human subjects data or other sensitive data. Release of the data could cause harm or violate statutes, and must remain confidential following restrictions from an Institutional Review Board (IRB) or similar body.

  1. Ethical research proceedures

We were wondering if you could also address this question specifically on Ethical Research Procedures: Describe how and the extent to which data collection procedures followed community standards for ethical research practices (e.g., CARE Principles). Be explicit about Insitutional Review Board approvals, consent waivers, procedures for co-production, data sovreignty, and other issues addressing responsible and ethical research. Include any steps to anonymize, aggregate or de-identify the dataset, or to otherwise create a version for public distribution. We can help add your answers to the question to the metadata.

Adding provenance

Is the [mention the file names here] files related? If so we can add provenance to show the relationship between the files. Here is an example of how that is displayed: https://arcticdata.io/catalog/view/doi%3A10.18739%2FA2WS8HM6C#urn%3Auuid%3Af00c4d71-0242-4e9d-9745-8999776fa2f2

At least one data file

We noticed that no data files were submitted. With the exception of sensitive social science data, we seek to include all data products prior to publication. We wanted to check if additional files will be submitted before we move forward with the submission process.

Open formats

Example using xlsx. Tailor this reponse to the format in question.

We noticed that the submitted data files are in xlsx format. Please convert your files to a plain text/csv (or other open format); this helps ensure your data are usable in the long-term.

The data files can be replaced by going to the green Edit button > Click the black triangle by the Describe button for the data file > Select Replace (attached is also a screenshot on how to get there).

Zip files

Except for very specific file types, we do not recommend that data are archived in zip format. Data that are zipped together can present a barrier to reuse since it is difficult to accurately document each file within the zip file in a machine readable way.

File contents and relationships among files are clear

Could you provide a short description of the files submitted? Information about how each file was generated (what software, source files, etc.) will help us create more robust metadata for long term use.

Data layout

Would you be able to clarify how the data in your files is laid out? Specifically, what do the rows and columns represent?

We try not to prescribe a way the researchers must format their data as long as reasonable. However, in extreme cases (for example Excel spreadsheets with data and charts all in one sheet) we will want to kindly ask them to reformat.

We would like to suggest a couple of modifications to the structure of your data. This will others to re-use it most effectively. [DESCRIBE WHAT MAY NEED TO BE CHANGED IN THE DATA SET]. Our data submission guidelines page (https://arcticdata.io/submit/) outlines what are best practices for data submissions to the Arctic Data Center. Let us know if you have any questions or if we can be of any help.

Attributes

Identify which attributes need additional information. If they are common attributes like date and time we do not need further clarification.

Checklist for the datateam in reviewing attributes (NetCDF, CSV, shapefiles, or any other tabular datasets):

  • A name (often the column or row header in the file).
  • A complete definition.
  • Any missing value codes along with explanations for those codes.
  • For all numeric data, unit information is needed.
  • For all date-time data, a date-time format is needed (e.g. “DD-MM-YYYY”).
  • For text data, full descriptions for all patterns or code/definition pairs are needed if the text is constrained to a list of patterns or codes.

Helpful templates: > We would like your help in defining some of the attributes. Could you write a short description or units for the attributes listed? [Provide a the attribute names in list form] > Could you describe ____? > Please define “XYZ”, including the unit of measure. > What are the units of measurement for the columns labeled “ABC” and “XYZ”?

Missing value codes

What do the missing values in your measurements represent? A short description of the reason why the values are missing (instrument failure, site not found, etc.) will suffice. This section is not yet available on our webform so we will add that information on your behalf.

We noticed that the data files contain [blank cells - replace with missing values found]. What do these represent?

Funding

All NSF funded datasets need a funding number. Non-NSF funded datasets might not have funding numbers, depending on the funding organization.

We noticed that your dataset does not appear to contain a funding number. The field accepts NSF funding numbers as well as other numbers by different organizations.

Methods

We noticed that methods were missing from the submission. Submissions should include the following:

  • provide instrument names (if applicable)
  • specify how sampling locations were chosen
  • if citations for sampling methods are used, please provide a brief summary of the methods referenced
  • any software used to process the data

Note - this includes software submissions as well (see https://arcticdata.io/submit/#metadata-guidelines-for-software)

Your methods section appears to be missing some information. [ADJUST THIS DEPENDING ON WHAT IS MISSING - Users should be able to understand how the data were collected, how to interpret the values, and potentially how to use the data in the case of specialized files.]

Comprehensive methods information should be included directly in the metadata record. Pointers or URLs to other sites are unstable.

A full example - New Submission: methods, excel to csv, and attributes

Thank you for your recent submission to the NSF Arctic Data Center!

From my preliminary examination of your submission I have noticed a few items that I would like to bring to your attention. We are here to help you publish your submission, but your continued assistance is needed to do so. See comments below:

If there is a publication associated with this dataset, we would appreciate it if you could register the DOI of your published paper with us by using the Citations button right below the title at the dataset landing page. We are working to build our catalog of dataset citations in the Arctic Data Center. This can be added at any point.

We would like to add some more context to your data package title. I would like to suggest: Holocene lake-based Arctic glacier and ice cap records.

We noticed that the submitted data files are in xlsx format. Please convert your files to a plain text/csv (or other open format); this helps ensure your data are usable in the long-term.

We also would like to suggest a couple of modifications to the structure of your data. This will others to re-use it most effectively. If the data was in a long rather than wide format, it would be easier to us to document.

Our data submission guidelines page (https://arcticdata.io/submit/) outlines what are best practices for data submissions to the Arctic Data Center. Let us know if you have any questions or if we can be of any help.

What do the missing values in your measurements represent? A short description of the reason why the values are missing (instrument failure, site not found, etc.) will suffice.

We noticed that methods were missing from the submission. Submissions should: - provide instrument names (if applicable) - specify how sampling locations were chosen - provide citations for sampling methods that are not explained in detail - any software used to process the data

After we receive your responses, we can make the edits on your behalf, or you are welcome to make them yourself using our user interface.

Best,

Name

Final email templates

Asking for approval

Hi [submitter],

I have updated your data package and you can view it here after logging in: [URL]

Please review and approve it for publishing or let us know if you would like anything else changed. For your convenience, if we do not hear from you within a week we will proceed with publishing with a DOI.

After publishing with a DOI, any further changes to the dataset will result in a new DOI. However, any previous DOIs will still resolve and point the user to the newest version.

Please let us know if you have any questions.

DOI and data package finalization comments

Replying to questions about DOIs

We attribute DOIs to data packages as one might give a DOI to a citable publication. Thus, a DOI is permanently associated with a unique and immutable version of a data package. If the data package changes, a new DOI will be created and the old DOI will be preserved with the original version.

DOIs and URLs for previous versions of data packages remain active on the Arctic Data Center (will continue to resolve to the data package landing page for the specific version they are associated with), but a clear message will appear at the top of the page stating that “A newer version of this dataset exists” with a hyperlink to that latest version. With this approach, any past uses of a DOI (such as in a publication) will remain functional and will reference the specific version of the data package that was cited, while pointing users to the newest version if one exists.

Clarification of updating with a DOI and version control

We definitely support updating a data package that has already been assigned a DOI, but when we do so we mark it as a new revision that replaces the original and give it its own DOI. We do that so that any citations of the original version of the data package remain valid (i.e.: after the update, people still know exactly which data were used in the work citing it).

Resolve the ticket

Sending finalized URL and dataset citation before resolving ticket

[NOTE: the URL format is very specific here, please try to follow it exactly (but substitute in the actual DOI of interest)]

Here is the link and citation to your finalized data package:

https://doi.org/10.18739/A20X0X

First Last, et al. 2021. Title. Arctic Data Center. doi:10.18739/A20X0X.

If in the future there is a publication associated with this dataset, we would appreciate it if you could register the DOI of your published paper with us by using the Citations button right below the title at the dataset landing page. We are working to build our catalog of dataset citations in the Arctic Data Center.

Please let us know if you need any further assistance.

Additional email templates

Deadlines

If the PI is checking about dates/timing: > [give rough estimate of time it might take] > Are you facing any deadlines? If so, we may be able to expedite publication of your submission.

Pre-assigned DOI

If the PI needs a DOI right away:

We can provide you with a pre-assigned DOI that you can reference in your paper, as long as your submission is not facing a deadline from NSF for your final report. However, please note that it will not become active until after we have finished processing your submission and the package is published. Once you have your dataset published, we would appreciate it if you could register the DOI of your published paper with us by using the citations button beside the orange lock icon. We are working to build our catalog of dataset citations in the Arctic Data Center.

Sensitive Data

Which of the following categories best describes the level of sensitivity of your data?

A. Non-sensitive data None of the data includes sensitive or protected information. Proceed with uploading data. B. Some or all data is sensitive but has been made safe for open distribution Sensitive data has been de-identified, anonymized, aggregated, or summarized to remove sensitivities and enable safe data distribution. Examples include ensuring that human subjects data, protected species data, archaeological site locations and personally identifiable information have been properly anonymized, aggregated and summarized. Proceed with uploading data, but ensure that only data that are safe for public distribution are uploaded. Address questions about anonymization, aggregation, de-identification, and data embargoes with the data curation support team before uploading data. Describe these approaches in the Methods section. C. Some or all data is sensitive and should not be distributed The data contains human subjects data or other sensitive data. Release of the data could cause harm or violate statutes, and must remain confidential following restrictions from an Institutional Review Board (IRB) or similar body. Do NOT upload sensitive data. You should still upload a metadata description of your dataset that omits all sensitive information to inform the community of the dataset’s existence. Contact the data curation support team about possible alternative approaches to safely preserve sensitive or protected data.

  1. Ethical Research Procedures. Please describe how and the extent to which data collection procedures followed community standards for ethical research practices (e.g., CARE Principles). Be explicit about Institutional Review Board approvals, consent waivers, procedures for co-production, data sovereignty, and other issues addressing responsible and ethical research. Include any steps to anonymize, aggregate or de-identify the dataset, or to otherwise create a version for public distribution.

Asking for dataset access

As a security measure we ask that we get the approval from the original submitter of the dataset prior to granting edit permissions to all datasets.

No response from the researcher

Please email them before resolving a ticket like this:

We are resolving this ticket for bookkeeping purposes, if you would like to follow up please feel free to respond to this email.

Recovering Dataset submissions

To recover dataset submissions that were not successful please do the following:

  1. Go to https://arcticdata.io/catalog/drafts
  2. Find your dataset and download the corresponding file
  3. Send us the file in an email

Adding metadata via R

KNB does not support direct uploading of EML metadata files through the website (we have a webform that creates metadata), but you can upload your data and metadata through R.

Here are some training materials we have that use both the EML and datapack packages. It explains how to set your authentication token, build a package from metadata and data files, and publish the package to one of our test sites. I definitely recommend practicing on a test site prior to publishing to the production site your first time through. You can point to the KNB test node (dev.nceas.ucsb.edu) using this command: d1c <- D1Client("STAGING2", "urn:node:mnTestKNB")

If you prefer, there are Java, Python, MATLAB, and Bash/cURL clients as well.

Finding multiple data packages

If linking to multiple data packages, you can send a link to the profile associated with the submitter’s ORCID iD and it will display all their data packages. e.g.: https://arcticdata.io/catalog/profile/http://orcid.org/0000-0002-2604-4533

NSF ARC data submission policy

Please find an overview of our submission guidelines here: https://arcticdata.io/submit/, and NSF Office of Polar Programs policy information here: https://www.nsf.gov/pubs/2016/nsf16055/nsf16055.jsp.

Investigators should upload their data to the Arctic Data Center (https://arcticdata.io), or, where appropriate, to another community endorsed data archive that ensures the longevity, interpretation, public accessibility, and preservation of the data (e.g., GenBank, NCEI). Local and university web pages generally are not sufficient as an archive. Data preservation should be part of the institutional mission and data must remain accessible even if funding for the archive wanes (i.e., succession plans are in place). We would be happy to discuss the suitability of various archival locations with you further. In order to provide a central location for discovery of ARC-funded data, a metadata record must always be uploaded to the Arctic Data Center even when another community archive is used.

Linking ORCiD and LDAP accounts

First create an account at orcid.org/register if you have not already. After that account registration is complete, login to the KNB with your ORCID iD here: https://knb.ecoinformatics.org/#share. Next, hover over the icon on the top right and choose “My Profile”. Then, click the “Settings” tab and scroll down to “Add Another Account”. Enter your name or username from your Morpho account and select yourself (your name should populate as an option). Click the “+”. You will then need to log out of knb.ecoinformatics.org and then log back in with your old LDAP account (click “have an existing account”, and enter your Morpho credentials with the organization set to “unaffiliated”) to finalize the linkage between the two accounts. Navigate to “My Profile” and “Settings” to confirm the linkage.

After completing this, all of your previously submitted data pacakges should show up on your KNB “My Profile” page, whether you are logged in using your ORCiD or Morpho account, and you will be able to submit data either using Morpho or our web interface.

Or, try reversing my instructions - log in first using your Morpho account (by clicking the “existing account” button and selecting organization “unaffiliated”), look for your ORCiD account, then log out and back in with ORCiD to confirm the linkage.