In this lesson we will learn how to perform some basic spatial analysis in R
. First we will review interpolation using the IDW interpolation method.
At the end of this lesson you will be able to:
writeRaster()
First, we load all the libraries that we plan to use in our code and ensure that the working directory is set.
# set working dir
setwd("~/Documents/data/oss/oss2017data")
# load libraries
library(dplyr)
library(tidyr)
library(gstat)
library(rgdal)
library(raster)
library(rgeos)
library(scales)
options(stringsAsFactors = FALSE)
Next, let’s get the data. The data that we are using today - similar to yesterday are stored on Figshare. We will use R to
# download the data from a figshare URL
download.file("https://ndownloader.figshare.com/files/8955466",
destfile = "./tues-data.zip")
#unzip the data into our oss2017 data directory
unzip("tues-data.zip")
Sometimes we want to extract values from a raster dataset and assign them to points or polygons. In the GIS works (Arc and QGIS) we do this via zonal statistics. In the R world we can perform the same tasks using the extract function.
In this segment of the lesson, we will use a set of points. At each point we will extract raster values from a buffer region around that point and calculate a summary stat (in this case a mean raster value). We will then plot the points on a map weighting the size according to relative mean value.
To begin let’s import some raster data and the same study area file that we used in class yesterday. Note that the data are in .nc
format. This is a hierarchical format which we will discuss more this afternoon.
study-area/study-area.shp
climate/air.sfc.mon.mean.nc
(HINT: this is a raster)study_area <- readOGR("study-area/study-area.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "study-area/study-area.shp", layer: "study-area"
## with 1 features
## It has 1 fields
## Integer64 fields read as strings: id
# import climate data - netcdf format
climate <- raster("climate/air.sfc.mon.mean.nc")
# check out our data - notice any issues?
crs(study_area)
## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
extent(study_area)
## class : Extent
## xmin : -107.5188
## xmax : -66.48791
## ymin : 10.817
## ymax : 37.22675
crs(climate)
## CRS arguments:
## +proj=lcc +x_0=5632642.22547 +y_0=4612545.65137 +lat_0=50
## +lon_0=-107 +lat_1=50 +lat_2=50 +ellps=WGS84
extent(climate)
## class : Extent
## xmin : -16231.49
## xmax : 11313351
## ymin : -16231.5
## ymax : 8976020
Next – what do we do? Notice any issues? Our data are not in the same projection. We will need to reproject the data in order to perform any processing on it.
Generally I try to avoid reprojecting raster data if at all possible. In this case, let’s reproject our raster data because we are working with a relatively small area (relative to the globe) and we are only extracting values from the data - we won’t be using it for further processing steps.
Generally i’d prefer to reproject the vector data.
Do the following
# reproject study area layer climate data
climate_geog <- projectRaster(climate,
crs = crs(study_area))
# crop the data
climate_geog_cr <- crop(climate_geog,
study_area)
plot(climate_geog_cr)
plot(study_area,
add = TRUE)
If we wanted, we could export our reprojected and cropped climate data as a geotiff.
Next, let’s import our vector data that we downloaded from figshare.
# import vector data
sea_level_2000_sp <- readOGR("tues-data/sea_level_2000.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "tues-data/sea_level_2000.shp", layer: "sea_level_2000"
## with 16 features
## It has 12 fields
crs(sea_level_2000_sp)
## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
extent(sea_level_2000_sp)
## class : Extent
## xmin : -97.21667
## xmax : -81.105
## ymin : 24.555
## ymax : 30.40333
Next we will extract climate surface temperature values for each point location in our sea_level_2000
layer. Rather than select a single value, we will create a circular BUFFER around each point and then extract raster values that fall within that circle. Finally, we will calculate a mean of all pixels values to create a final value.
# Note that below will return a data.frame containing the max height
# calculated from all pixels in the buffer for each plot
climate_mean <- raster::extract(climate_geog_cr, # the raster that you wish to extract values from
sea_level_2000_sp, # a point, or polygon spatial object
buffer = .5, # specify a .5 degree radius
fun = mean, # extract the MEAN value from each plot
sp = TRUE) # create spatial object
class(climate_mean)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
The default option when we extract data in R is to store all of the raster pixel values in a list. However, when we add a function argument to extract(), R summarizes the data for us. Also note that we are using the sp = TRUE argument to tell R to create a spatialPointsDataFrame.
This will allow us to plot our data!
Speaking of plot - let’s plot the points on top of our climate data! Notice below that i’ve tricked R to plot the points by surface temperature value. Because the range of the data is not large, i’ve tricked R by sending the cex = argument (which specifies the point size) a command which
cex = ((climate_mean$Monthly.air.temperature.at.Surface - 280)/10)
# view mean values
climate_mean$Monthly.air.temperature.at.Surface
## [1] 291.2770 291.7492 291.6662 291.4210 291.2210 287.7624 287.1931
## [8] 292.1241 283.0992 291.6178 283.5167 287.2808 294.3733 294.0009
## [15] 297.4615 297.5251
# scale the data
(climate_mean$Monthly.air.temperature.at.Surface - 280)/10
## [1] 1.1277020 1.1749156 1.1666199 1.1420974 1.1220982 0.7762443 0.7193124
## [8] 1.2124098 0.3099197 1.1617819 0.3516650 0.7280778 1.4373332 1.4000871
## [15] 1.7461509 1.7525143
Let’s plot the data.
# plot climate data
plot(climate_geog_cr,
main = "sample points sized by surface temperature")
plot(climate_mean,
pch = 19,
# size the points according to the temp value
cex = ((climate_mean$Monthly.air.temperature.at.Surface - 280)/10),
add = TRUE)
Let’s try to scale the points using the scales library. Here we scale the elevation data to to a range inclusive of 1:3.
# plot climate data
plot(climate_geog_cr,
main = "sample points sized by surface temperature")
plot(climate_mean,
pch = 19,
# size the points according to the temp value
cex = scales::rescale(climate_mean$Monthly.air.temperature.at.Surface - 280, to = c(1, 3)),
add = TRUE)
Below is an example of applying a color ramp to the data. In this case we don’t need to do this because we have resized the points.
#Create a function to generate a continuous color palette
rbPal <- colorRampPalette(c('yellow', 'purple'))
#This adds a column of color values
# based on the y values
climate_mean$col <- rbPal(3)[as.numeric(cut(climate_mean$Monthly.air.temperature.at.Surface, breaks = 3))]
plot(study_area,
main = "Mean surface temperature at select locations",
axes = FALSE)
plot(climate_geog_cr,
col = grey.colors(n = 100, 0, 1),
add = TRUE)
plot(climate_mean,
pch = 19,
# size the points according to the temp value
cex = (((climate_mean$Monthly.air.temperature.at.Surface) - 280)/10),
col = climate_mean$col,
add = TRUE)
Great work! You’ve now extracted values at your study area. Next, we will walk through an example of interpolating our data.
First we open some point data. Here we have sea level elevation for the year 2000.
# import csv file
sea_level_2000 <- read.csv("tues-data/sea_lev_2000.csv")
# view data structure
str(sea_level_2000)
## 'data.frame': 20 obs. of 12 variables:
## $ X.1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
## $ elev_mm : int 7133 7085 6989 7079 7300 7012 7021 7138 6983 NA ...
## $ N : chr "N" "N" "N" "N" ...
## $ X : int 0 0 0 0 0 0 0 0 0 0 ...
## $ station_id: int 497 538 725 828 161 1835 1903 526 1884 1156 ...
## $ lat : num 26.1 28 28.9 29.3 29.3 ...
## $ long : num -97.2 -97 -95.3 -94.8 -94.8 ...
## $ name : chr " PORT ISABEL " " ROCKPORT " " FREEPORT " " GALVESTON I, PLEASURE PIER, TX " ...
## $ coast : int 940 940 940 940 940 940 940 940 940 940 ...
## $ coast_num : int 1 5 6 7 8 10 15 21 32 37 ...
## $ C : chr " N" " N" " N" " N" ...
# check out our data
sea_level_2000$elev_mm
## [1] 7133 7085 6989 7079 7300 7012 7021 7138 6983 NA 7064 6959 6901 NA
## [15] NA 7138 7015 NA 7211 7110
Notice that we have some NA values (missing data values) in our data. Let’s remove those. We will use a pipe combined with the tidyr
function drop_na() to remove all rows containing na values.
# remove NA values
sea_level_2000 <- sea_level_2000 %>%
drop_na()
Next, we will create a new object. We’ll use _sp in the name as we will turn this object into a spatial object. Then we will make sure that we have a default x and y column that stores longitude and latitude information respectively.
# create spatial points object
sea_level_2000_sp <- sea_level_2000
class(sea_level_2000_sp)
## [1] "data.frame"
In the next series of steps we convert our data.frame object into a spatial object using the coordinates()
function. Notice that we are using a different notation below to perform this.
~x + y
simply tells R to use the x and y columns to respectively to convert a non spatial object into a spatial object.
# convert the data into spatial coordinates
coordinates(sea_level_2000_sp) <- ~long + lat
class(sea_level_2000_sp)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
Let’s plot our data to see what it looks like.
# view spatial points
plot(sea_level_2000_sp,
main = "Gulf - sea level point location data")
In the next step, we will setup our interpolation. We will need to do the following:
In the example below, we use the Inverse Distance Weighted (IDW) interpolator.
First, let’s create our grid. We are going to use a specified (in this case) spatial extent to populate this grid. There is no magic here associated with creating the grid area. You will have to decide that area that you think is reasonable to perform the interpolation on.
##### IDW interpolation #####
# establish an extent within which you want to interpolate
# -99/24 to -80/32.
x_range <- as.numeric(c(-99, -80)) # min/max longitude of the interpolation area
y_range <- as.numeric(c(24, 32)) # min/max latitude of the interpolation area
# create an empty grid of values ranging from the xmin-xmax, ymin-ymax
grd <- expand.grid(x = seq(from = x_range[1],
to = x_range[2],
by = 0.1),
y = seq(from = y_range[1], to = y_range[2],
by = 0.1)) # expand points to grid
class(grd)
## [1] "data.frame"
Next we will convert this grid into a spatial points and then spatial pixels object.
# Convert grd object to a matrix and then turn into a spatial
# points object
coordinates(grd) <- ~x + y
# turn into a spatial pixels object
gridded(grd) <- TRUE
#### view grid with points overlayed
plot(grd, cex = 1.5, col = "grey")
plot(sea_level_2000_sp,
pch = 15,
col = "red",
cex = 1,
add = TRUE)
In the last step we use idw() to interpolate our points to a grid. idw() takes several arguments
# interpolate the data
idw_pow1 <- idw(formula = elev_mm ~ 1,
locations = sea_level_2000_sp,
newdata = grd,
idp = 1)
## [inverse distance weighted interpolation]
# Notice that the output data is a SpatialPixelsDataFrame
class(idw_pow1)
## [1] "SpatialPixelsDataFrame"
## attr(,"package")
## [1] "sp"
# plot the data
plot(idw_pow1,
col = terrain.colors(55))
# convert spatial pixels df to raster
idw_pow1_ras <- raster(idw_pow1)
# export to geotif
writeRaster(idw_pow1_ras,
filename = "idw_pow1.tif", "GTiff")
Let’s create a difference surface with a larger power.
# interpolate the data
idw_pow3 <- idw(formula = elev_mm ~ 1,
locations = sea_level_2000_sp,
newdata = grd,
idp = 3)
## [inverse distance weighted interpolation]
# plot the data
plot(idw_pow3,
col = terrain.colors(55))
Finally, let’s explore how distance impacts our interpolation.
# interpolate the data
idw_dist1 <- idw(formula = elev_mm ~ 1,
locations = sea_level_2000_sp,
newdata = grd,
idp = 1,
maxdist=1)
## [inverse distance weighted interpolation]
plot(idw_dist1,
main = "IDW: distance = 1 degree, power = 1")
# interpolate the data
idw_dist5 <- idw(formula = elev_mm ~ 1,
locations = sea_level_2000_sp,
newdata = grd,
idp = .2,
maxdist = 5)
## [inverse distance weighted interpolation]
plot(idw_dist5,
main = "IDW: distance = 5 degrees, power = 1")
Let’s increase the distance evern more
# interpolate the data
idw_dist15 <- idw(formula = elev_mm ~ 1,
locations = sea_level_2000_sp,
newdata = grd,
idp = .2,
maxdist = 15)
## [inverse distance weighted interpolation]
plot(idw_dist15,
main = "IDW: distance = 15 degrees, power = .2")