Overview

Welcome!

This workshop provides an overview of many of the packages included in the tidyverse suite of packages for the R programming language. The tidyverse is a veritable universe of tools though that no single workshop could hope to cover so we are focusing here on an introductory approach that focuses primarily on some fundamentals to tidying data in R. We are always happy to improve workshop content so please don’t hesitate to post an Issue on our GitHub repository if you see clear areas for improvement!

To maximize the value of this workshop to you, we recommend that you take the following steps before the day of the workshop. If anything is unclear, feel free to reach out to us; our contact information can be found in the “Content Creators” tab.

Programs to Install

R & RStudio

Install R and its more convenient (in our opinion) user-interface: RStudio.

If you already have R, check that you have at least version 4.0.0 by running the following code:

version$version.string

If your version starts with a 3 (e.g., the above code returns “R version 3…”), please update R to make sure all packages behave as expected.

R Packages

Install the tidyverse and palmerpenguins R packages using the following code:

install.packages("tidyverse", "palmerpenguins")
library(tidyverse)
library(palmerpenguins)

Please run the above code even if you already have these packages to update these packages and ensure that your code aligns with the examples and challenges introduced during the workshop.

Penguin Data

The data we’ll be using for this workshop comes from the palmerpenguins package, maintained by Allison Horst. The “penguins” dataset from this package contains size measurements for adult foraging penguins near Palmer Station, Antarctica. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station Long Term Ecological Research (LTER) Program. Let’s take a look at it!

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_…¹ body_…² sex    year
   <fct>   <fct>              <dbl>         <dbl>      <int>   <int> <fct> <int>
 1 Adelie  Torgersen           39.1          18.7        181    3750 male   2007
 2 Adelie  Torgersen           39.5          17.4        186    3800 fema…  2007
 3 Adelie  Torgersen           40.3          18          195    3250 fema…  2007
 4 Adelie  Torgersen           NA            NA           NA      NA <NA>   2007
 5 Adelie  Torgersen           36.7          19.3        193    3450 fema…  2007
 6 Adelie  Torgersen           39.3          20.6        190    3650 male   2007
 7 Adelie  Torgersen           38.9          17.8        181    3625 fema…  2007
 8 Adelie  Torgersen           39.2          19.6        195    4675 male   2007
 9 Adelie  Torgersen           34.1          18.1        193    3475 <NA>   2007
10 Adelie  Torgersen           42            20.2        190    4250 <NA>   2007
# … with 334 more rows, and abbreviated variable names ¹​flipper_length_mm,
#   ²​body_mass_g

The “penguins” dataset has 344 rows and 8 columns.

The columns are as follows:

species: a factor denoting penguin species (Adélie, Chinstrap and Gentoo)

island: a factor denoting island in Palmer Archipelago, Antarctica (Biscoe, Dream or Torgersen)

bill_length_mm: a number denoting bill length (millimeters)

bill_depth_mm: a number denoting bill depth (millimeters)

flipper_length_mm: an integer denoting flipper length (millimeters)

body_mass_g: an integer denoting body mass (grams)

sex: a factor denoting penguin sex (female, male)

year: an integer denoting the study year (2007, 2008, or 2009)

This dataset is an example of tidy data, which means that each variable is in its own column and each observation is in its own row. Generally speaking, functions from packages in the tidyverse expect tidy data though they can be used in some cases to help get data into tidy format! Regardless, the penguins dataset is what we’ll use for all examples in this workshop so be sure that you install the palmerpenguins R package. The examples on this page were adapted from Allison Horst’s dplyr tutorial!

Websites to Visit

Supplemental Material

While not technically necessary to attend the workshop, if you’d like you can see the content that created the workshop website you are viewing by visiting our GitHub repository here.

Also, check out NCEAS’ Learning Hub for a complete list of workshops and trainings offered by NCEAS.