Overview
The scicomptools
package is built to house
self-contained tools built by the NCEAS Scientific Computing Team. These
functions are written based on the needs of various synthesis working
groups so each function operates more or less independently of the
others. We hope these tools will be valuable to users in and outside of
the context for which they were designed!
This vignette describes some of the main functions of
scicomptools
using the examples.
Extract Summary Statistics Tables
Our most requested task–by a wide margin–is to help streamline an
analytical workflow by stripping the summary statistics (e.g., test
statistic, P value, degrees of freedom, etc.) from whichever statistical
test the group has decided to use. As we’ve developed extraction
workflows for various tests we have centralized them in the
stat_extract
function.
Currently, stat_extract
accepts model fit objects from
lmerTest::lmer
, stats::lm
,
stats::nls
, stats::t.test
,
nlme::lme
, ecodist::MRM
, and
RRPP::trajectory.analysis
. All model fits should be passed
to the mod_fit
argument. If the model is a trajectory
analysis, the traj_angle
argument must be supplied as well
(either “deg” or “rad” for whether the multivariate angle should be
expressed in degrees or radians). Once the models are fit, they can be
passed to stat_extract
for extraction of typically-desired
summary values.
Note: stat_extract
determines the model
type automatically so this need not be specified.
# Fit models of several accepted types
## Student's t-Test
mod_t <- t.test(x = 1:10, y = 7:20)
## Nonlinear Least Squares
mod_nls <- fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), data = DNase)
# Extract the relevant information
## t-Test
scicomptools::stat_extract(mod_fit = mod_t)
#> Estimate DF T_Value P_Value
#> 1 5.5 21.98221 -5.43493 1.855282e-05
#> 2 13.5 21.98221 -5.43493 1.855282e-05
## NLS
scicomptools::stat_extract(mod_fit = mod_nls)
#> Term Estimate Std_Error T_Value P_Value
#> 1 Asym 2.485319 0.06287043 39.53081 1.519039e-88
#> 2 xmid 1.518117 0.06396583 23.73325 2.704647e-56
#> 3 scal 1.098307 0.02442013 44.97549 2.203937e-97
As more tests are used by working groups we plan on adding them to
the set of model objects that stat_extract
supports. Feel
free to post a
GitHub issue if you have a model type in mind that you’d like to
request we expand stat_extract
to support.
Create Google Drive Table of Contents
Google Drive is by far the most common platform that the working
groups we support leverage to store data. The googledrive
package makes integrating script inputs and outputs with specified Drive
folders easy but the internal folder structure of these Google Drives
often becomes convoluted as projects evolve and priorities shift.
We wrote the drive_toc
function to identify all folders
in a given Drive folder (either the top-level link or any sub-folder
link will work) and create a diagram of this hierarchy to make it
simpler for group’s to visualize their Drive’s “table of contents”.
This function has a url
argument which requires a Drive
ID (i.e., a link passed through googledrive::as_id
). There
is also an ignore_names
argument that allows users to
specify folder names that they would like excluded from the table of
contents; this is useful if many sub-folders contain a “deprecated” or
“archive” folder as this would greatly clutter the full table of
contents to include repeated folders like these. Finally, this process
can take a few seconds for complicated Drives so a message is returned
by default for which folder the function is identifying sub-folders. The
quiet
argument defaults to FALSE
so that
progress is reported but it can be set to TRUE
if you
desire no message be returned.
Read in Multiple Excel Sheets
Working groups often store their data in Microsoft Excel files with
multiple sheets (often one sheet per data type or per collection
location). We wrote read_xl_sheets
to import every sheet in
a user-specified MS Excel file and store each sheet in a list. If every
sheet has the same columns, you can then easily unlist them into a flat
dataframe (rather than the somewhat laborious process of reading in each
sheet separately).
read_xl_sheets
only has a single argument
(file_name
) which accepts the name of (and path to) the MS
Excel file to read.
# Read in sheets
sheet_list <- scicomptools::read_xl_sheets(file_name = system.file("extdata", "faux_data.xlsx",
package = "scicomptools"))
# Show structure
dplyr::glimpse(sheet_list)
#> List of 2
#> $ my_data: tibble [5 × 5] (S3: tbl_df/tbl/data.frame)
#> ..$ col_1: chr [1:5] "a" "b" "c" "d" ...
#> ..$ col_2: chr [1:5] "A" "A" "A" "B" ...
#> ..$ col_3: num [1:5] 2 43 2 2 2
#> ..$ col_4: num [1:5] 234 435 32 45 4
#> ..$ col_5: num [1:5] 236 478 34 47 6
#> $ data_2 : tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
#> ..$ col_2 : chr [1:2] "A" "B"
#> ..$ temperature: num [1:2] 10 20
We also wrote a companion function named read_xl_format
where the specific formatting of all cells in the sheets of an
Excel file is extracted. This is useful if fill color or text formatting
is used to denote information that is not redundant with columns (i.e.,
information that is lost when reading an Excel sheet into an API).
read_xl_format
uses the same syntax as
read_xl_sheets
to maximize interoperability.
# Read in *format of* sheets
form_list <- scicomptools::read_xl_format(file_name = system.file("extdata", "faux_data.xlsx",
package = "scicomptools"))
# Show structure of that
dplyr::glimpse(form_list)
#> Rows: 36
#> Columns: 13
#> $ sheet <chr> "my_data", "my_data", "my_data", "my_data", "my_data", "…
#> $ address <chr> "A1", "B1", "C1", "D1", "E1", "A2", "B2", "C2", "D2", "E…
#> $ row <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4,…
#> $ col <int> 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4,…
#> $ cell_contents <chr> "col_1", "col_2", "col_3", "col_4", "col_5", "a", "A", "…
#> $ comment <chr> NA, NA, NA, NA, "Nick Lyon:Hello world", NA, NA, NA, NA,…
#> $ formula <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "SUM(C2:D2)", NA, NA…
#> $ bold <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
#> $ italic <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
#> $ underline <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ font_size <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, …
#> $ font_color <chr> "FF000000", "FF000000", "FF000000", "FF000000", "FF00000…
#> $ cell_color <chr> NA, NA, NA, NA, NA, NA, NA, "FFFFFFFF", NA, NA, NA, NA, …
Handling Two Working Directories
Many of the working groups that we support work both on local computers as well as in remote servers. We leverage Git and GitHub to pass code updates back and forth between the two environments but updating the working directory every time such a pivot is made is cumbersome.
We have thus written wd_loc
for this purpose!
wd_loc
allows users to specify two working directory paths
and toggle between them using a logical argument.
The local
argument accepts a logical value. If
local = TRUE
, then the file path specified in the
local_path
argument is returned as a character string. If
local = FALSE
then the file path specified in the
remote_path
argument is used. local_path
defaults to base::getwd()
so it need not be specified if
you are using RStudio Projects.
You can run wd_loc
inside of base::setwd
if
desired though we recommend assigning the file path to a “path” object
and invoking that whenever import or export must be done.
#> /Users/.../scicomptools/vignettes
Checking Tokens
Some operations require passing a user’s personal access token (a.k.a. “PAT”) to RStudio. These workflows can fail unexpectedly if the token is improperly set or expires so it is valuable to double check whether the token is still embedded.
The token_check
function checks whether a token is
attached and messages whether or not one is found. Currently this
function supports checks for Qualtrics and GitHub tokens but this can be
expanded if need be (post a GitHub
issue if you have another token in mind!). Additionally, if
secret = TRUE
(the default) the success message simply
identifies whether a token is found. If secret = FALSE
the
success message prints the token string in the console.
scicomptools::token_check(api = "github", secret = TRUE)
Exporting GitHub Issues as PDF Files
Sometimes it’s helpful to save GitHub issues as a reference for the
future, but manually exporting each issue as a PDF file can be
time-consuming. This is why issue_extract
was created to
streamline this process. When given the URL of a GitHub repository and a
numeric vector of issue numbers, this function will start a remote
Google Chrome browser session via Chromote and
automatically navigate to the specified GitHub issues to export them as
PDF files.
issue_extract
works for both public and non-public
GitHub repositories. Since non-public GitHub repositories require
authentication, first you will need to save your credentials and then
pass them into issue_extract
. Below is an example of using
this function to export GitHub issues from a private GitHub Enterprise
repository.
The first step is to open a new Chrome session.
# Start new Chrome session
b <- ChromoteSession$new()
# Open interactive window
b$view()
Then navigate to any page on the GitHub Enterprise/private repository
and enter your credentials or login information on the browser session.
Once you’ve logged in, save the cookies associated with this browser
session as an R object using b$Network$getCookies()
. Export
this R object for later use and close the browser once you’re done.
# Navigate to a random Github Enterprise issue
# Make sure to log in to GitHub Enterprise
b$Page$navigate("https://github.nceas.ucsb.edu/LTER/lter-wg-scicomp/issues/278")
# Save credentials as cookies
cookies <- b$Network$getCookies()
# Export cookies for later use
saveRDS(cookies, "cookies.rds")
# Close the browser tab/window
b$close()
Finally, restart your R session. Now you should be able to pass the
file containing your cookies into issue_extract
and access
GitHub Enterprise/private repositories!
# After saving cookies, you should restart R
# Then read in the cookies and export the necessary issues
# Export GitHub issues #295 through #300 for a GitHub Enterprise repository
issue_extract(repo_url = "https://github.nceas.ucsb.edu/LTER/lter-wg-scicomp",
issue_nums = 295:300,
export_folder = "scicomp_issues",
cookies = "cookies.rds")