Creating R packages

Why packages?

Most R users are familiar with loading and utilizing packages in their work. And they know how rich CRAN is in providing for many conceivable needs. Ironically, most have never created a package for their own work, and most think the process is too complicated. It is not. Creating packages serves two main use cases:

Mechanism to redistribute reusable code
Mechanism to reproducibly document analysis and models and their results

The devtools and roxygen package makes creating and maintaining a package to be a straightforward experience.

Creating a package for your own personal use is a convenient way to build utility functions that you can include throughout your code, rather than using source to import files. Using source can work, but it makes your code more fragile by introducing dependencies on the specific locations of the files that you source. In comparison, an installed package is available to your code anywhere, and is easily updatable.

Install and load packages

# install.packages("devtools")
library(devtools)
library(roxygen2)

Create a directory for the package

Thanks to the great devtools package, it only takes one function call to create the skeleton of an R package using create(). Which eliminates pretty much all reasons for procrastination. All you do is:

library("devtools")
create("mytools")

This will create a top-level directory structure, including a number of critical files under the standard R package structure. The most important of which is the DESCRIPTION file, which provides metadata about your package. Edit the DESCRIPTION file to provide reasonable values for each of the fields. Information about choosing a LICENSE is provided in the Extending R documentation.

Package: mytools
Title: Utility functions created by Matt Jones
Version: 0.1
Authors@R: "Matthew Jones <jones@nceas.ucsb.edu> [aut, cre]"
Description: Package mytools contains a suite of utility functions useful whenever I need stuff to get done.
Depends: R (>= 3.1.0)
License: Apache License (== 2.0)
LazyData: true

For discussion on when to list a package under Imports or Depends, see this discussion on StackOverflow. But in brief:

Avoid depends as much as possible. It’s basically like saying library(other_package) every time your package is loaded. This can lead to a large number of dependencies being installed each time. Instead, list a package under Imports like so:

Imports:
    ggplot2

The in the documentation for any package, add the following line:

#' @importFrom ggplot2 ggplot

This will ensure that only the necessary functions and packages are downloaded and specific functions are referenced by NAMESPACE. If you need to use all (or most functions from a package), you can import all of its functions:

#' @import ggplot2

For an example to poke around, see the codyn package that came out of the NCEAS community dynamics working group https://github.com/laurenmh/codyn

See how we listed an import: 1. https://github.com/laurenmh/codyn/blob/master/DESCRIPTION#L24 2. and a list of specifically imported functions from a package: https://github.com/laurenmh/codyn/blob/master/R/community_stability.R#L28

Add your code

The skeleton package created contains a directory R which should contain your source files. Add your functions and classes in files to this directory, attempting to choose names that don’t conflict with existing packages. For example, you might add a file info.R that contains a function environment_info() that you might want to reuse. This one might leave something to be desired…

environment_info <- function(msg) {
    print("This should really do something useful!")
    print(paste("Also print the incoming message: ", msg))
}

Add documentation

You should provide documentation for each of your functions and classes. This is done in the roxygen2 approach of providing embedded comments in the source code files, which are in turn converted into manual pages and other R documentation artifacts. Be sure to define the overall purpose of the function, and each of its parameters.

#' A function to print information about the current environment.
#'
#' This function prints current environment information, and a message.
#' @param msg The message that should be printed
#' @keywords debugging
#' @export
#' @examples
#' environment_info("Hi, what is your name?")
environment_info <- function(msg) {
    print("This should really do something useful!")
    print(paste("Also print the incoming message: ", msg))
}

Once your files are documented, you can then process the documentation using the document() function to generate the appropriate .Rd files that your package needs.

setwd("./mytools")
document()
# or document(".")

That’s really it. You now have a package that you can check() and install() and release(). See below for these helper utilities.

Exercise

Add two functions to the mytools package for converting temperatures from Fahrenheit to Celsius and back. Then check, build, and install the package, and then use it from the console to do some conversions.

Checking and installing your package

Now that your package is built, you can check it for consistency and completeness using check(), and then you can install it locally using install(), which needs to be run from the parent directory of your module.

setwd("./mytools")
check()
setwd("..")
install("mytools")

Your package is now available for use in your local environment.

Creating R packages

Matt Jones and Karthik Ram

July 28, 2014

Why packages?

Install and load packages

Create a directory for the package

Add your code

Add documentation

Exercise

Checking and installing your package

Creating R packages

Matt Jones and Karthik Ram

July 28, 2014

Why packages?

Install and load packages

Create a directory for the package

Add your code

Add documentation

Exercise

Checking and installing your package

Sharing and releasing your package