Uploading plot data with vegbankr

Introduction to `vegbankr`

This package is an R client for VegBank, the vegetation plot database of the Ecological Society of America’s Panel on Vegetation Classification, hosted by the National Center for Ecological Analysis and Synthesis (NCEAS). VegBank contains vegetation plot data, community types recognized by the U.S. National Vegetation Classification and others, and all ITIS/USDA plant taxa along with other taxa recorded in plot records. As a VegBank API client, the vegbankr package currently supports querying and downloading vegetation plot records and other supporting information from the VegBank database, and supports validating and uploading new data to the VegBank database as well.

Contributing data to VegBank

To upload data to VegBank, you must first request contributor permission from the ESA Vegetation Classification Panel. You can request to be a contributor by emailing help@vegbank.org and the panel will evaluate your request with the goal of maintaining high-quality vegetation data in the system. Once your contributor role is granted, you will be able to log in and upload new plot data with the vegbankr R package.

To use vegbankr to upload data, there are 3 key steps:

Model and transform your data to the VegBank Loader Table format
Validate your data
Upload your data using vb_upload_plot_observations(...)

This vignette will walk through these 3 steps, with an emphasis on modeling and validating data.

VegBank Loader Tables

Loader tables are the data format that is used to upload data into VegBank. In order to publish your data to VegBank, the first step is to model whatever format your data is in to the loader table format, and then transform the data into that format. Modeling your data means identifying how each piece of information in your original dataset (like species names, plot locations, survey dates) corresponds to specific fields in the VegBank loader table format. There are a total of 12 loader tables that can be used for plot observation data, though not all are required for data ingest. In this section, the loader tables and their fields will be described, in the order in which it is recommended to prepare them.

Each loader table you create is an R data.frame that is eventually passed to a VegBank upload function. In the documentation below, interactive tables display the allowed field (column names), whether the field is required, best practice, commonly used, or sometimes used, and a description of the field.

There are a number of fields that act as codes and that are used as primary and secondary keys to link loader tables together. Codes that begin with user_ are supplied by the data contributor. Codes that begin with vb_ are created by the database upon data upload.

Projects

This table stores information about a project established to collect vegetation plot data. The user_pj_code is the project code primary key, and is found as a foreign key in several other tables. An example project code might be MOJA with project name “Mojave Desert Vegetation Surveys.”

Parties

The Parties loader table is used to upload new parties (people) associated with plots, projects, taxa, and classifications. The primary key is user_py_code which is used as a foreign key in the Contributors loader table. Once uploaded, VegBank will create a vb_py_code for each party to be used in the Contributors table.

Contributors

The contributors loader table is fairly code-heavy, but is closely linked to both the Projects table and the Parties table, and is used to link parties (people) with their contributions to plots, projects, taxa, and classifications.

user_py_code is a foreign key to the parties table, so any values present in this field must be present there as well. Optionally, instead of user_py_code, vb_py_code can be used if the party is already in VegBank. Only one of user_py_code or vb_py_code must be present, and having both in the same row is disallowed.

vb_ar_code is the VegBank role code - a code in the format ar.{nn}. A table of allowed values and their meanings is listed below the loader table variables.

Finally, the contributor_type indicates whether this contributor is linked to an Observation, Project, or Classification. The record_identifier value will depend on which of these three types the contributor is associated with. If the contributor should be associated with a project with user_pj_code MOJA, the record_identifier for that contributor should be MOJA and the contributor type should be Project. This transitively will also associate the contributor with all plots associated with that project. If the contributor should only be associated with a particular observation with user_ob_code MOJA_0214, the record identifier is that observation identifier, and the contributor_type should be Observation.

Plot Observations

The plot observations loader table contains all data that is consistent across a plot. This includes information on the plot name, location, physical features, non-vegetation cover, etc. This table has many optional fields that may or may not be applicable to your project.

Similar to the pattern described in contributors, one of vb_pl_code or user_pl_code is required, and only one of those two fields may be used for each row. vb_pl_code would only be used if the intention is to add a new observation record to the same plot. These codes are used as foreign keys in various other tables. author_plot_code is often the same as these codes (e.g., MOJA_0214) but there they could be different for valid reasons. author_plot_code is prominently displayed in the user VegBank interface as the plot identifier.

user_ob_code is the primary key for an observation on a plot, and may be the same as the plot code if there is only one observation of each plot. If there are multiple observations on the same plot, however, user_ob_code must be unique for each observation.

user/vb_pj_code can link a plot and its observation back to a project. Values in these fields must be present either in the projects loader table (user_pj_code) or VegBank if the project is already uploaded (vb_pj_code).

Community Classifications

The community classifications loader table contains the community classification of an observation.

The primary key is user_cl_code. The foreign key user_ob_code corresponds to the key present in the plot observations loader table, and is required. All values in this field must also be present in the plot observations table.

vb_cc_code is the VegBank community concept identifier, a required field, the value of which must already be present in VegBank. To retrieve a list of possible vb_cc_code values, use the vegbankr function vb_get_community_concepts.

Strata Cover

This loader table contains data from a plot observation of the plant names, cover, and strata in a given plot. The user_ob_code is a required foreign key that links the plant to a plot observation. All values in this field must be present in the plot observations loader table. user_to_code is a key that is unique for each combination of user_ob_code, plant name, and strata in this table. user_tm_code is a key that is unique for each combination of user_ob_code and plant name - so user_tm_code may be repeated in this table if a plant exists in multiple strata in a plot observation. user_sr_code is a foreign key that corresponds to the strata loader table, described below.

Strata Methods

Each strata value must also have a VegBank strata method associated with it. This is represented by vb_sy_code. To see available codes, see the code snippet below the table. The strata method is linked to an observation via the required user_ob_code field. user_sr_code is a required identifier that provides a key to each unique strata observation in the strata cover table.

vb_strata <- vb_get_stratum_methods(with_nested = TRUE) %>% 
  unnest(stratum_types) %>% 
  mutate(stratum_index = tolower(stratum_index)) %>% 
  rename(Stratum = stratum_index)

Taxon Interpretations

Taxon interpretations associate the plants in the Strata Cover table with an existing VegBank plant concept code. To get a list of existing plant concepts, use the vb_get_plant_concepts function. Note that a person with a role is also required for this table, so one of user_py_code (present in the parties loader table) or vb_py_code (an existing VegBank party) must exist, along with their role, in vb_ar_code.

Disturbances

The disturbances loader table contains information about disturbances observed at a plot, such as fire, grazing, logging, or other events that have impacted the vegetation. The primary key is user_do_code. The foreign key user_ob_code links each disturbance record to a specific plot observation and is required. type describes the kind of disturbance and is a required field. This field is a closed list in VegBank, with options listed below.

Below are the allowed disturbance types:

Soils

The Soils loader table is used to describe soils collected from a plot. This includes information on soil horizons, texture, color, depth, and chemical properties.

The primary key is user_so_code, which can be a simple row number. user_ob_code links to the observations table. The foreign key user_ob_code links each soil record to a specific plot observation when used. horizon is a required field that identifies the soil horizon being described.

Stem Data

The Stem Data loader table is used to describe individual plant stems measured at a plot. This table supports detailed tree/shrub demographic data collection, including stem diameter, height, location, and health status.

The primary key is user_sc_code, which is the stem count identifier. The required foreign key user_tm_code links each stem record to a specific taxon observation in the strata cover table, associating stems with their species identification.

Data Validation

Validating that your data conform to the VegBank schema is an important step to a successful data upload. Of course, validation occurs before ingest into the database, and some validation is also done by the API, but users often benefit from getting easy to read validation results prior to even attempting a data upload.

The vb_validate family of functions will check for the presence of required fields, unique fields, and cross-check required foreign keys across tables.

To validate your plot observations data before submitting, you pass all of your loader tables as data.frames to the appropriate arguments in vb_validate_plot_observations.

vb_validate_plot_observations(plot_observations = plots,
                              projects = projects,
                              parties = party,
                              contributors = contrib,
                              disturbances = dist,
                              community_classifications = comm,
                              strata_cover_data = strata_cover,
                              taxon_interpretations = tax,
                              strata = strata)

If there are issues in the data, the validator will return output that looks like this:

✖ disturbances.user_ob_code values not found in plot_observations.user_ob_code: DO001, DO002, DO003
ℹ soils table not provided - skipping validation
ℹ stem_data table not provided - skipping validation
ℹ references table not provided - skipping validation

In this example, user_ob_code in the disturbances table contains values that are not found in user_ob_code in plot_observations. As described in the disturbances section of the loader tables requirements above, all values in this foreign key must be present in user_ob_code in the plot observations loader table.

To correct this mistake, you would need to return to the code that performed the data modeling to determine what the cause of the issue is. It could be that the entire wrong column in the original data was mapped to one of the two user_ob_codes, or it could be that there are capitalization or white space issues that make the same code not be recognized as equivalent across tables. Note that validation is case and white space sensitive across all checks.

Data Upload

Once data are validated, you are ready to try to upload data. First you need to point your R session to the correct VegBank instance and set a token.

To do a test upload, point to the test instance.

vb_set_base_url("https://api-dev.vegbank.org")

Next, get a token by logging into http://api-dev.vegbank.org/login. After logging in, you should see a JSON document that contains an access token and refresh token. The easiest way to get this information into R is to select the “Raw Data” option in your browser (if available) to get the plain text JSON. Copy this to your clipboard, and paste it into R to save to the variable token. You’ll need to encase this string in single quotes. Double quotes will give a syntax error.

token <- '{"message":"Authorization successful","token":{"access_token":".......","refresh_token":"......."}}'

Then set your token using:

vb_set_token(tokens = token)

From here, uploading is easy using the vb_upload_plot_observations() function. This function takes as arguments data frames for each of the loader tables described above. Note that not all loader tables are required.

To run the function, you’ll take the same data.frames used in the validation section and pass them as arguments to the function like below. Note the dry_run argument. This is a way to do one final round of validation before inserting the data into VegBank. Setting a dry run allows the API to go through all of the steps except the very last insert call. If the dry run is successful, output will display saying that rows were inserted (but they weren’t!). If it is not successful, it will error and more work is needed to ensure the loader tables conform to the required schema.

vb_upload_plot_observations(plot_observations = plots_semi,
                            projects = projects,
                            parties = party,
                            contributors = contrib_semi,
                            disturbances = dist_semi,
                            community_classifications = comm_semi,
                            strata_cover_data = strat_semi,
                            taxon_interpretations = tax_semi,
                            strata = strat_defs_semi,
                            soils = soils,
                            dry_run = TRUE)

Once you get a successful dry run, set dry_run to FALSE to complete your upload!

Introduction to vegbankr