Introduction to vegbankr
This package is an R client for VegBank, the vegetation plot database
of the Ecological Society of America’s Panel on Vegetation Classification,
hosted by the National Center for
Ecological Analysis and Synthesis (NCEAS). VegBank contains
vegetation plot data, community types recognized by the U.S. National
Vegetation Classification and others, and all ITIS/USDA plant taxa along
with other taxa recorded in plot records. As a VegBank API client, the
vegbankr package currently supports querying and
downloading vegetation plot records and other supporting information
from the VegBank database, and supports validating and uploading new
data to the VegBank database as well.
Contributing data to VegBank
To upload data to VegBank, you must first request contributor permission from the ESA Vegetation Classification Panel. You can request to be a contributor by emailing help@vegbank.org and the panel will evaluate your request with the goal of maintaining high-quality vegetation data in the system. Once your contributor role is granted, you will be able to log in and upload new plot data with the vegbankr R package.
To use vegbankr to upload data, there are 3 key
steps:
- Model and transform your data to the VegBank Loader Table format
- Validate your data
- Upload your data using
vb_upload_plot_observations(...)
This vignette will walk through these 3 steps, with an emphasis on modeling and validating data.
VegBank Loader Tables
Loader tables are the data format that is used to upload data into VegBank. In order to publish your data to VegBank, the first step is to model whatever format your data is in to the loader table format, and then transform the data into that format. Modeling your data means identifying how each piece of information in your original dataset (like species names, plot locations, survey dates) corresponds to specific fields in the VegBank loader table format. There are a total of 12 loader tables that can be used for plot observation data, though not all are required for data ingest. In this section, the loader tables and their fields will be described, in the order in which it is recommended to prepare them.
Each loader table you create is an R data.frame that is
eventually passed to a VegBank upload function. In the documentation
below, interactive tables display the allowed field (column
names), whether the field is required, best practice, commonly used, or
sometimes used, and a description of the field.
There are a number of fields that act as codes and that are used as
primary and secondary keys to link loader tables together. Codes that
begin with user_ are supplied by the data contributor.
Codes that begin with vb_ are created by the database upon
data upload.
Projects
This table stores information about a project established to collect
vegetation plot data. The user_pj_code is the project code
primary key, and is found as a foreign key in several other tables. An
example project code might be MOJA with project name
“Mojave Desert Vegetation Surveys.”
Parties
The Parties loader table is used to upload new parties (people)
associated with plots, projects, taxa, and classifications. The primary
key is user_py_code which is used as a foreign key in the
Contributors loader table. Once uploaded, VegBank will create a
vb_py_code for each party to be used in the Contributors
table.
Contributors
The contributors loader table is fairly code-heavy, but is closely linked to both the Projects table and the Parties table, and is used to link parties (people) with their contributions to plots, projects, taxa, and classifications.
user_py_code is a foreign key to the
parties table, so any values present in this field must be
present there as well. Optionally, instead of user_py_code,
vb_py_code can be used if the party is already in VegBank.
Only one of user_py_code or vb_py_code must be
present, and having both in the same row is disallowed.
vb_ar_code is the VegBank role code - a code in the
format ar.{nn}. A table of allowed values and their
meanings is listed below the loader table variables.
Finally, the contributor_type indicates whether this
contributor is linked to an Observation, Project, or Classification. The
record_identifier value will depend on which of these three
types the contributor is associated with. If the contributor should be
associated with a project with user_pj_code
MOJA, the record_identifier for that
contributor should be MOJA and the contributor type should
be Project. This transitively will also associate the contributor with
all plots associated with that project. If the contributor should only
be associated with a particular observation with
user_ob_code MOJA_0214, the record identifier
is that observation identifier, and the contributor_type
should be Observation.
Plot Observations
The plot observations loader table contains all data that is consistent across a plot. This includes information on the plot name, location, physical features, non-vegetation cover, etc. This table has many optional fields that may or may not be applicable to your project.
Similar to the pattern described in contributors, one of
vb_pl_code or user_pl_code is required, and
only one of those two fields may be used for each row.
vb_pl_code would only be used if the intention is to add a
new observation record to the same plot. These codes are used as foreign
keys in various other tables. author_plot_code is often the
same as these codes (e.g., MOJA_0214) but there they could
be different for valid reasons. author_plot_code is
prominently displayed in the user VegBank interface as the plot
identifier.
user_ob_code is the primary key for an observation on a
plot, and may be the same as the plot code if there is only one
observation of each plot. If there are multiple observations on the same
plot, however, user_ob_code must be unique for each
observation.
user/vb_pj_code can link a plot and its observation back
to a project. Values in these fields must be present either in the
projects loader table (user_pj_code) or
VegBank if the project is already uploaded
(vb_pj_code).
Community Classifications
The community classifications loader table contains the community classification of an observation.
The primary key is user_cl_code. The foreign key
user_ob_code corresponds to the key present in the plot
observations loader table, and is required. All values in this field
must also be present in the plot observations table.
vb_cc_code is the VegBank community concept identifier,
a required field, the value of which must already be present in VegBank.
To retrieve a list of possible vb_cc_code values, use the
vegbankr function
vb_get_community_concepts.
Strata Cover
This loader table contains data from a plot observation of the plant
names, cover, and strata in a given plot. The user_ob_code
is a required foreign key that links the plant to a plot observation.
All values in this field must be present in the plot observations loader
table. user_to_code is a key that is unique for each
combination of user_ob_code, plant name, and strata in this
table. user_tm_code is a key that is unique for each
combination of user_ob_code and plant name - so
user_tm_code may be repeated in this table if a plant
exists in multiple strata in a plot observation.
user_sr_code is a foreign key that corresponds to the
strata loader table, described below.
Strata Methods
Each strata value must also have a VegBank strata method associated
with it. This is represented by vb_sy_code. To see
available codes, see the code snippet below the table. The strata method
is linked to an observation via the required user_ob_code
field. user_sr_code is a required identifier that provides
a key to each unique strata observation in the strata cover table.
Taxon Interpretations
Taxon interpretations associate the plants in the Strata Cover table
with an existing VegBank plant concept code. To get a list of existing
plant concepts, use the vb_get_plant_concepts function.
Note that a person with a role is also required for this table, so one
of user_py_code (present in the parties loader
table) or vb_py_code (an existing VegBank party) must
exist, along with their role, in vb_ar_code.
Disturbances
The disturbances loader table contains information about disturbances
observed at a plot, such as fire, grazing, logging, or other events that
have impacted the vegetation. The primary key is
user_do_code. The foreign key user_ob_code
links each disturbance record to a specific plot observation and is
required. type describes the kind of disturbance and is a
required field. This field is a closed list in VegBank, with options
listed below.
Below are the allowed disturbance types:
Soils
The Soils loader table is used to describe soils collected from a plot. This includes information on soil horizons, texture, color, depth, and chemical properties.
The primary key is user_so_code, which can be a simple
row number. user_ob_code links to the observations table.
The foreign key user_ob_code links each soil record to a
specific plot observation when used. horizon is a required
field that identifies the soil horizon being described.
Stem Data
The Stem Data loader table is used to describe individual plant stems measured at a plot. This table supports detailed tree/shrub demographic data collection, including stem diameter, height, location, and health status.
The primary key is user_sc_code, which is the stem count
identifier. The required foreign key user_tm_code links
each stem record to a specific taxon observation in the strata cover
table, associating stems with their species identification.
Data Validation
Validating that your data conform to the VegBank schema is an important step to a successful data upload. Of course, validation occurs before ingest into the database, and some validation is also done by the API, but users often benefit from getting easy to read validation results prior to even attempting a data upload.
The vb_validate family of functions will check for the
presence of required fields, unique fields, and cross-check required
foreign keys across tables.
To validate your plot observations data before submitting, you pass
all of your loader tables as data.frames to the appropriate
arguments in vb_validate_plot_observations.
vb_validate_plot_observations(plot_observations = plots,
projects = projects,
parties = party,
contributors = contrib,
disturbances = dist,
community_classifications = comm,
strata_cover_data = strata_cover,
taxon_interpretations = tax,
strata = strata)If there are issues in the data, the validator will return output that looks like this:
✖ disturbances.user_ob_code values not found in plot_observations.user_ob_code: DO001, DO002, DO003
ℹ soils table not provided - skipping validation
ℹ stem_data table not provided - skipping validation
ℹ references table not provided - skipping validation
In this example, user_ob_code in the
disturbances table contains values that are not found in
user_ob_code in plot_observations. As
described in the disturbances section of the loader tables requirements
above, all values in this foreign key must be present in
user_ob_code in the plot observations loader table.
To correct this mistake, you would need to return to the code that
performed the data modeling to determine what the cause of the issue is.
It could be that the entire wrong column in the original data was mapped
to one of the two user_ob_codes, or it could be that there
are capitalization or white space issues that make the same code not be
recognized as equivalent across tables. Note that validation is case and
white space sensitive across all checks.
Data Upload
Once data are validated, you are ready to try to upload data. First you need to point your R session to the correct VegBank instance and set a token.
To do a test upload, point to the test instance.
vb_set_base_url("https://api-dev.vegbank.org")Next, get a token by logging into http://api-dev.vegbank.org/login.
After logging in, you should see a JSON document that contains an access
token and refresh token. The easiest way to get this information into R
is to select the “Raw Data” option in your browser (if available) to get
the plain text JSON. Copy this to your clipboard, and paste it into R to
save to the variable token. You’ll need to encase this
string in single quotes. Double quotes will give a
syntax error.
token <- '{"message":"Authorization successful","token":{"access_token":".......","refresh_token":"......."}}'Then set your token using:
vb_set_token(tokens = token)From here, uploading is easy using the
vb_upload_plot_observations() function. This function takes
as arguments data frames for each of the loader tables described above.
Note that not all loader tables are required.
To run the function, you’ll take the same data.frames
used in the validation section and pass them as arguments to the
function like below. Note the dry_run argument. This is a
way to do one final round of validation before inserting the data into
VegBank. Setting a dry run allows the API to go through all of the steps
except the very last insert call. If the dry run is successful, output
will display saying that rows were inserted (but they weren’t!). If it
is not successful, it will error and more work is needed to ensure the
loader tables conform to the required schema.
vb_upload_plot_observations(plot_observations = plots_semi,
projects = projects,
parties = party,
contributors = contrib_semi,
disturbances = dist_semi,
community_classifications = comm_semi,
strata_cover_data = strat_semi,
taxon_interpretations = tax_semi,
strata = strat_defs_semi,
soils = soils,
dry_run = TRUE)Once you get a successful dry run, set dry_run to
FALSE to complete your upload!