Reproducible Data Access

intermediate

Accessing data is the first step in any analysis. Doing this in a portable manner is the ultimate goal to make your work as reproducible as possible.
Image credit: NCEAS Learning Hub

Description

Traditional ways of working with data – as files on a specific computer – limit code reproducibility to local computer environments. A typical R analysis file will load one or many data files from the local disk. However, it is rarely the case that those file paths work on your colleague’s machine. Alternatively, we can access data through the Internet with a URL. But what happens if the URL to the data changes? (Which they often do). This lesson teaches you about pins and content identifiers and how to use them to access your data in a truly reproducible way.

Prerequisites

Be familiar with R and RStudio
Basic knowledge of data packages and data repositories

Learning Goal

List best practices for reproducible data access
Access data on the web and with pins
Explain how content identifiers differ from DOIs and make research more reproducible
Demonstrate ways to register and resolve content identifiers for unpublished data
Identify how content identifiers can resolve to a published data source

Duration

1 hour