Reproducible Data Access
Accessing data is the first step in any analysis. Doing this in a portable manner is the ultimate goal to make your work as reproducible as possible.
Image credit: NCEAS Learning Hub
Description
Traditional ways of working with data – as files on a specific computer – limit code reproducibility to local computer environments. A typical R analysis file will load one or many data files from the local disk. However, it is rarely the case that those file paths work on your colleague’s machine. Alternatively, we can access data through the Internet with a URL. But what happens if the URL to the data changes? (Which they often do). This lesson teaches you about pins and content identifiers and how to use them to access your data in a truly reproducible way.
Prerequisites
- Be familiar with R and RStudio
- Basic knowledge of data packages and data repositories
Learning Goal
- List best practices for reproducible data access
- Access data on the web and with pins
- Explain how content identifiers differ from DOIs and make research more reproducible
- Demonstrate ways to register and resolve content identifiers for unpublished data
- Identify how content identifiers can resolve to a published data source
Duration
1 hour