Intro to spatial data in R - Open and plot raster and vector data with base plot

Introduction

In this lesson we will learn how to perform some basic spatial analysis in R. First we will review interpolation using the IDW interpolation method.

Learning outcomes

At the end of this lesson you will be able to:

Interpolate data using Inverse Distance Weighted (IDW) interpolator
Export a raster to geotiff format using writeRaster()

Setup

First, we load all the libraries that we plan to use in our code and ensure that the working directory is set.

# set working dir
setwd("~/Documents/data/oss/oss2017data")
# load libraries
library(dplyr)
library(tidyr)
library(gstat)
library(rgdal)
library(raster)
library(rgeos)
library(scales)
options(stringsAsFactors = FALSE)

Next, let’s get the data. The data that we are using today - similar to yesterday are stored on Figshare. We will use R to

download the data and
unzip it into the SAME data directory that we used yesterday

# download the data from a figshare URL
download.file("https://ndownloader.figshare.com/files/8955466",
              destfile = "./tues-data.zip")
#unzip the data into our oss2017 data directory
unzip("tues-data.zip")

Extract data from a raster in R

Sometimes we want to extract values from a raster dataset and assign them to points or polygons. In the GIS works (Arc and QGIS) we do this via zonal statistics. In the R world we can perform the same tasks using the extract function.

In this segment of the lesson, we will use a set of points. At each point we will extract raster values from a buffer region around that point and calculate a summary stat (in this case a mean raster value). We will then plot the points on a map weighting the size according to relative mean value.

Extract data from a buffer

To begin let’s import some raster data and the same study area file that we used in class yesterday. Note that the data are in .nc format. This is a hierarchical format which we will discuss more this afternoon.

open study-area/study-area.shp
open climate/air.sfc.mon.mean.nc (HINT: this is a raster)

study_area <- readOGR("study-area/study-area.shp")

## OGR data source with driver: ESRI Shapefile 
## Source: "study-area/study-area.shp", layer: "study-area"
## with 1 features
## It has 1 fields
## Integer64 fields read as strings:  id

# import climate data - netcdf format
climate <- raster("climate/air.sfc.mon.mean.nc")

# check out our data - notice any issues?
crs(study_area)

## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

extent(study_area)

## class       : Extent 
## xmin        : -107.5188 
## xmax        : -66.48791 
## ymin        : 10.817 
## ymax        : 37.22675

crs(climate)

## CRS arguments:
##  +proj=lcc +x_0=5632642.22547 +y_0=4612545.65137 +lat_0=50
## +lon_0=-107 +lat_1=50 +lat_2=50 +ellps=WGS84

extent(climate)

## class       : Extent 
## xmin        : -16231.49 
## xmax        : 11313351 
## ymin        : -16231.5 
## ymax        : 8976020

Next – what do we do? Notice any issues? Our data are not in the same projection. We will need to reproject the data in order to perform any processing on it.

Generally I try to avoid reprojecting raster data if at all possible. In this case, let’s reproject our raster data because we are working with a relatively small area (relative to the globe) and we are only extracting values from the data - we won’t be using it for further processing steps.

Generally i’d prefer to reproject the vector data.

Do the following

reproject the raster data (HINT: use the projectRaster function)
crop the raster data using the study_area layer.
finally - plot both layers together using the plot function

# reproject study area layer climate data
climate_geog <- projectRaster(climate,
                              crs = crs(study_area))
# crop the data
climate_geog_cr <- crop(climate_geog,
                        study_area)
plot(climate_geog_cr)
plot(study_area,
     add = TRUE)

If we wanted, we could export our reprojected and cropped climate data as a geotiff.

Import vector points layer

Next, let’s import our vector data that we downloaded from figshare.

# import vector data
sea_level_2000_sp <- readOGR("tues-data/sea_level_2000.shp")

## OGR data source with driver: ESRI Shapefile 
## Source: "tues-data/sea_level_2000.shp", layer: "sea_level_2000"
## with 16 features
## It has 12 fields

crs(sea_level_2000_sp)

## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

extent(sea_level_2000_sp)

## class       : Extent 
## xmin        : -97.21667 
## xmax        : -81.105 
## ymin        : 24.555 
## ymax        : 30.40333

Next we will extract climate surface temperature values for each point location in our sea_level_2000 layer. Rather than select a single value, we will create a circular BUFFER around each point and then extract raster values that fall within that circle. Finally, we will calculate a mean of all pixels values to create a final value.

# Note that below will return a data.frame containing the max height
# calculated from all pixels in the buffer for each plot
climate_mean <- raster::extract(climate_geog_cr, # the raster that you wish to extract values from
                    sea_level_2000_sp, # a point, or polygon spatial object
                    buffer = .5, # specify a .5 degree radius
                    fun = mean, # extract the MEAN value from each plot
                    sp = TRUE) # create spatial object
class(climate_mean)

## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

The default option when we extract data in R is to store all of the raster pixel values in a list. However, when we add a function argument to extract(), R summarizes the data for us. Also note that we are using the sp = TRUE argument to tell R to create a spatialPointsDataFrame.

This will allow us to plot our data!

Speaking of plot - let’s plot the points on top of our climate data! Notice below that i’ve tricked R to plot the points by surface temperature value. Because the range of the data is not large, i’ve tricked R by sending the cex = argument (which specifies the point size) a command which

takes the climate_mean value and subtracts 280
divides by 10 to make the points smaller

cex = ((climate_mean$Monthly.air.temperature.at.Surface - 280)/10)

# view mean values
climate_mean$Monthly.air.temperature.at.Surface

##  [1] 291.2770 291.7492 291.6662 291.4210 291.2210 287.7624 287.1931
##  [8] 292.1241 283.0992 291.6178 283.5167 287.2808 294.3733 294.0009
## [15] 297.4615 297.5251

# scale the data
(climate_mean$Monthly.air.temperature.at.Surface - 280)/10

##  [1] 1.1277020 1.1749156 1.1666199 1.1420974 1.1220982 0.7762443 0.7193124
##  [8] 1.2124098 0.3099197 1.1617819 0.3516650 0.7280778 1.4373332 1.4000871
## [15] 1.7461509 1.7525143

Let’s plot the data.

# plot climate data
plot(climate_geog_cr,
     main = "sample points sized by surface temperature")
plot(climate_mean,
     pch = 19,
     # size the points according to the temp value
     cex = ((climate_mean$Monthly.air.temperature.at.Surface - 280)/10),
     add = TRUE)

Let’s try to scale the points using the scales library. Here we scale the elevation data to to a range inclusive of 1:3.

# plot climate data
plot(climate_geog_cr,
     main = "sample points sized by surface temperature")
plot(climate_mean,
     pch = 19,
     # size the points according to the temp value
     cex = scales::rescale(climate_mean$Monthly.air.temperature.at.Surface - 280, to = c(1, 3)),
     add = TRUE)

Below is an example of applying a color ramp to the data. In this case we don’t need to do this because we have resized the points.

#Create a function to generate a continuous color palette
rbPal <- colorRampPalette(c('yellow', 'purple'))

#This adds a column of color values
# based on the y values
climate_mean$col <- rbPal(3)[as.numeric(cut(climate_mean$Monthly.air.temperature.at.Surface, breaks = 3))]

plot(study_area,
     main = "Mean surface temperature at select locations",
     axes = FALSE)
plot(climate_geog_cr,
     col = grey.colors(n = 100, 0, 1),
     add = TRUE)
plot(climate_mean,
     pch = 19,
     # size the points according to the temp value
     cex = (((climate_mean$Monthly.air.temperature.at.Surface) - 280)/10),
     col = climate_mean$col,
     add = TRUE)

Great work! You’ve now extracted values at your study area. Next, we will walk through an example of interpolating our data.

Going from points to raster - spatial interpolation in R

First we open some point data. Here we have sea level elevation for the year 2000.

# import csv file
sea_level_2000 <- read.csv("tues-data/sea_lev_2000.csv")

# view data structure
str(sea_level_2000)

## 'data.frame':    20 obs. of  12 variables:
##  $ X.1       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ year      : int  2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
##  $ elev_mm   : int  7133 7085 6989 7079 7300 7012 7021 7138 6983 NA ...
##  $ N         : chr  "N" "N" "N" "N" ...
##  $ X         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ station_id: int  497 538 725 828 161 1835 1903 526 1884 1156 ...
##  $ lat       : num  26.1 28 28.9 29.3 29.3 ...
##  $ long      : num  -97.2 -97 -95.3 -94.8 -94.8 ...
##  $ name      : chr  " PORT ISABEL                             " " ROCKPORT                                " " FREEPORT                                " " GALVESTON I, PLEASURE PIER, TX          " ...
##  $ coast     : int  940 940 940 940 940 940 940 940 940 940 ...
##  $ coast_num : int  1 5 6 7 8 10 15 21 32 37 ...
##  $ C         : chr  " N" " N" " N" " N" ...

# check out our data
sea_level_2000$elev_mm

##  [1] 7133 7085 6989 7079 7300 7012 7021 7138 6983   NA 7064 6959 6901   NA
## [15]   NA 7138 7015   NA 7211 7110

Notice that we have some NA values (missing data values) in our data. Let’s remove those. We will use a pipe combined with the tidyr function drop_na() to remove all rows containing na values.

# remove NA values
sea_level_2000 <- sea_level_2000 %>%
  drop_na()

Next, we will create a new object. We’ll use _sp in the name as we will turn this object into a spatial object. Then we will make sure that we have a default x and y column that stores longitude and latitude information respectively.

# create spatial points object
sea_level_2000_sp <- sea_level_2000
class(sea_level_2000_sp)

## [1] "data.frame"

Create spatial object

In the next series of steps we convert our data.frame object into a spatial object using the coordinates() function. Notice that we are using a different notation below to perform this.

~x + y simply tells R to use the x and y columns to respectively to convert a non spatial object into a spatial object.

# convert the data into spatial coordinates
coordinates(sea_level_2000_sp) <- ~long + lat
class(sea_level_2000_sp)

## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Let’s plot our data to see what it looks like.

# view spatial points
plot(sea_level_2000_sp,
     main = "Gulf - sea level point location data")

In the next step, we will setup our interpolation. We will need to do the following:

We create an EMPTY grid across a spatial extent that we with to interpolate our data within
We populate that grid with interpolated values generated from the points layer that we created above.

In the example below, we use the Inverse Distance Weighted (IDW) interpolator.

First, let’s create our grid. We are going to use a specified (in this case) spatial extent to populate this grid. There is no magic here associated with creating the grid area. You will have to decide that area that you think is reasonable to perform the interpolation on.

##### IDW interpolation #####
# establish an extent within which you want to interpolate
# -99/24 to -80/32.
x_range <- as.numeric(c(-99, -80))  # min/max longitude of the interpolation area
y_range <- as.numeric(c(24, 32))  # min/max latitude of the interpolation area

# create an empty grid of values ranging from the xmin-xmax, ymin-ymax
grd <- expand.grid(x = seq(from = x_range[1],
                   to = x_range[2], 
                   by = 0.1),
                   y = seq(from = y_range[1],                                           to = y_range[2], 
                       by = 0.1))  # expand points to grid
class(grd)

## [1] "data.frame"

Next we will convert this grid into a spatial points and then spatial pixels object.

# Convert grd object to a matrix and then turn into a spatial
# points object
coordinates(grd) <- ~x + y
# turn into a spatial pixels object
gridded(grd) <- TRUE

#### view grid with points overlayed
plot(grd, cex = 1.5, col = "grey")
plot(sea_level_2000_sp,
       pch = 15,
       col = "red",
       cex = 1,
       add = TRUE)

In the last step we use idw() to interpolate our points to a grid. idw() takes several arguments

first the formula: elev_mm ~ 1 tells r to use the x,y coordinates combined with the elev_mm value to perform the griding
locations = represents the spatial points objects that you wish to grid
newdata = represents the grid object that you will insert the values into
finally we specify the power. Generall power values range between 1-3 with a smaller number creating a smoother surface (stronger influence from surrounding points) and a larger number creating a surface that is more true to the actual data and in turn a less smooth surface potentially.

# interpolate the data
idw_pow1 <- idw(formula = elev_mm ~ 1,
           locations = sea_level_2000_sp,
           newdata = grd,
           idp = 1)

## [inverse distance weighted interpolation]

# Notice that the output data is a SpatialPixelsDataFrame
class(idw_pow1)

## [1] "SpatialPixelsDataFrame"
## attr(,"package")
## [1] "sp"

# plot the data
plot(idw_pow1,
     col = terrain.colors(55))

# convert spatial pixels df to raster
idw_pow1_ras <- raster(idw_pow1)
# export to geotif
writeRaster(idw_pow1_ras,
            filename = "idw_pow1.tif", "GTiff")

Let’s create a difference surface with a larger power.

# interpolate the data
idw_pow3 <- idw(formula = elev_mm ~ 1,
           locations = sea_level_2000_sp,
           newdata = grd,
           idp = 3)

## [inverse distance weighted interpolation]

# plot the data
plot(idw_pow3,
     col = terrain.colors(55))

Finally, let’s explore how distance impacts our interpolation.

# interpolate the data
idw_dist1 <- idw(formula = elev_mm ~ 1,
           locations = sea_level_2000_sp,
           newdata = grd,
           idp = 1,
           maxdist=1)

## [inverse distance weighted interpolation]

plot(idw_dist1,
     main = "IDW: distance = 1 degree, power = 1")

# interpolate the data
idw_dist5 <- idw(formula = elev_mm ~ 1,
           locations = sea_level_2000_sp,
           newdata = grd,
           idp = .2,
           maxdist = 5)

## [inverse distance weighted interpolation]

plot(idw_dist5,
     main = "IDW: distance = 5 degrees, power = 1")

Let’s increase the distance evern more

# interpolate the data
idw_dist15 <- idw(formula = elev_mm ~ 1,
           locations = sea_level_2000_sp,
           newdata = grd,
           idp = .2,
           maxdist = 15)

## [inverse distance weighted interpolation]

plot(idw_dist15,
     main = "IDW: distance = 15 degrees, power = .2")