Introduction

In this lesson we will learn how to perform some basic cleaning and plotting of spatial data in R.

Learning outcomes

At the end of this 30 minute overview you will be able to: 1. Open a vector data layer in R using readOGR() 1. Open a raster data layer in R using raster() 1. Create basic maps using ggplot() 1. Reproject and crop raster and vector data

Work with vector data in R

Intro to vector data in R - Earth Data Science website

Point, line OR polygon features can be stored within a vector dataset

Point, line OR polygon features can be stored within a vector dataset

There are many ways to import and map vector data in R.

To read the data, you have several options

# unzip data
#library(utils) 
setwd("~/Documents/data/oss-institute")
#setwd("~/Documents/github/oss-lessons/spatial-data-gis-law")
library(rgdal)
library(raster)
library(ggplot2)
library(rgeos)
library(mapview)
library(leaflet)
library(broom) # if you plot with ggplot and need to turn sp data into dataframes
options(stringsAsFactors = FALSE)

First, let’s download some data from natural earth.

# download the data 
download.file("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_coastline.zip", 
              destfile = 'coastlines.zip')

# unzip the file
unzip(zipfile = "coastlines.zip", 
      exdir = 'ne-coastlines-10m')

Next, we can open the data using readOGR from the sp (spatial) package.

# load the data 
coastlines <- readOGR("ne-coastlines-10m/ne_10m_coastline.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "ne-coastlines-10m/ne_10m_coastline.shp", layer: "ne_10m_coastline"
## with 4132 features
## It has 2 fields

Are the data points, lines of polygons?
CHALLENGE – Looking at the data, what are the 2 possible vector data structures that this data could be stored in?

# view spatial attributes
class(coastlines)
## [1] "SpatialLinesDataFrame"
## attr(,"package")
## [1] "sp"
extent(coastlines)
## class       : Extent 
## xmin        : -180 
## xmax        : 180 
## ymin        : -85.22194 
## ymax        : 83.6341
crs(coastlines)
## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

Super speedy quick plot with R baseplot … or not. Be patient - this object has a lot of complex features

plot(coastlines, 
     main = "Global Coastlines")

This particular layer is complex. There are many details in the boundaries as rendered that we may want if we zoom in but may not need to produce a global map. let’s Simplify it. The gSimplify function is a part of the rgeos package. The simplify function removes vertices from complex lines. Remember that a line is composed of vertices. A circle is simply a line with lots of vertices - the more vertices it has, the more ‘round’ the line appears.
Simplify vertices

As you use this function keep in mind that you are modifying your data. You probably don’t want to do this if you are performing any sort of quantitative analysis on the data but you definitely want to do this if you are creating online maps and other visual products from your data.

The gSimplify function takes 3 arguments

  1. the data that you want to simplify
  2. tol - the tolerance value - a large number will remove more vertices, make the data small AND yield a “blockier” looking object. a SMALLER number will retain more vertices and maintain a smoother looking feature.
# simplify geometry
coastlines_simp <- gSimplify(coastlines, 
                            tol = 3, 
                            topologyPreserve = TRUE)
plot(coastlines_simp,
     main = "map with boundaries simplified")

Notice that here the map plots faster, but now it looks blocky. We may have simplified TOO MUCH. let’s reduce the tol = argument value to .1.

# simplify with a lower tolerance value (keeping more detail)
coastlines_sim2 <- gSimplify(coastlines, 
                             tol = .1, 
                             topologyPreserve = TRUE)
plot(coastlines_sim2, 
     main = "Map of coastlines - simplified geometry\n tol=.1")

That’s better. We now have enough detail for plotting purposes but have increased speed dramatically. These types of steps become important when creating online interactive maps to optimize speed.

IMPORTANT: when you modify the geometry you are also modifying the data - in this case any calculated perimeter or area values using these data will be compromised.

ggplot example

Less speedy plot with ggplot – but it looks so nice! NOTE: ggplot throws an error if you don’t include the data = argument for some reason on your geom_ element. Be sure to always expliceltly include this.

# turn the data into a spatial data frame 
coastlines_sim2_df <- SpatialLinesDataFrame(coastlines_sim2,
                                            coastlines@data) 
#tidy(coastlines_sim2_df)

# plot the data 
ggplot() +
  geom_path(data = coastlines_sim2_df, aes(x = long, y = lat, group = group)) +
  labs(title = "Global Coastlines - using ggplot")

coord_fixed() is your best friend – it will make the map x and y Also notice i’m using labs to add a title and x and y axis labels. Cool stuff.

ggplot() +
  geom_path(data = coastlines_sim2_df, aes(x = long, y = lat, group = group)) + 
  coord_fixed() + 
  labs(title = "My awesome ggplot map of coastlines",
       subtitle = "my awesome subtitle",
       x = "", y = "") # cause we don't need x and y labels do we?

Interactive maps with leaflet

Make it interactive using mapview. Mapview is just a wrapper for leaflet. Leaflet is a powerful javascript based tool that can be used to create interactive maps - like google maps. The mapview library in R is a wrapper around leaflet that allows you to quickly create interactive maps.

COOL TIP 1: note that when you knit to html your interactive map will be embedded in your html file! This means you can send your friends (or yourself! :) ) and interactive map! COOL TIP 2: you can also publish your interactive map to rpubs! Bryce will talk more about this later today.

# create leaflet 
mapview(coastlines_sim2)
# create a leaflet object
leaflet(coastlines_sim2) %>%
  addTiles() %>% # add basemap to your map
  # then add a lines layer
  addPolylines(color = "#444444", weight = 1, smoothFactor = 0.5,
    opacity = 1.0)

Vector data with SF

This is all great but – a bit slow. Sf will be the wave of the future. It’s much faster than sp however isn’t fully supported across all tools and projects just yet… it will be! Below is just a quick example of how it works. We won’t use it for today’s lesson…

library(sf)
# import the data - sf is much faster
coastlines_sf <- st_read("ne-coastlines-10m/ne_10m_coastline.shp")
## Reading layer `ne_10m_coastline' from data source `/Users/lewa8222/Documents/data/oss-institute/ne-coastlines-10m/ne_10m_coastline.shp' using driver `ESRI Shapefile'
## Simple feature collection with 4132 features and 2 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: -180 ymin: -85.22194 xmax: 180 ymax: 83.6341
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
# plotting is still a bit slow
plot(coastlines_sf[2])

SF will have full ggplot support in the future but for now, you can only get to it by installing ggplot from the github dev branch. For this reason we won’t use sf in this workshop but know that it will be more popular in the upcoming years.

# this only works if you install ggplot from github

#devtools::install_github("tidyverse/ggplot2")
#library(ggplot2)
# the dev version is not currently installing properly
ggplot() +
  geom_sf(data=coastlines_sf, aes(fill = featurecla))  

Plot two layers on top of each other

Next, let’s import another data layer and plot it on top of our coastlines.

us_states <- readOGR("us-boundaries/us_bound_pop.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "us-boundaries/us_bound_pop.shp", layer: "us_bound_pop"
## with 58 features
## It has 14 fields
us_states
## class       : SpatialPolygonsDataFrame 
## features    : 58 
## extent      : -124.7258, -66.94989, 24.49813, 49.38436  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 14
## names       :     AFFGEOI, STATEFP,  STATENS, GEOID, STUSPS,    NAME, LSAD,        ALAND,     AWATER,  region, GEO_id2, GEO_ds_, POPGROUP_d,  HD01_VD 
## min values  : 0400000US01,      01, 00068085,    01,     AL, Alabama,   00, 102262419204, 1026252314, Midwest,      01, Alabama,        001, 10006693 
## max values  : 0400000US56,      56, 01785534,    56,     WY, Wyoming,   00,  92790411854, 8504608966,    West,      56, Wyoming,        001,  9900571

Use add = TRUE to plot us_states on top of

plot(coastlines_sim2, 
     main = "Coastlines with the NE state boundaries")
plot(us_states, 
     add = TRUE,
     col = "purple")

CHALLENGE - Next, create a map with the following layers

  1. coastlines_sim2
  2. us_states
  3. study-area/study-area-merc.shp

Note that you will need to first import the study area layer and then add it to your map.

## OGR data source with driver: ESRI Shapefile 
## Source: "study-area/study-area-merc.shp", layer: "study-area-merc"
## with 1 features
## It has 1 fields
## Integer64 fields read as strings:  id

We’ve plotted each of them previously so we know the data are OK. How does it go?

Dealing with coordinate reference systems

Our data are in different coordinate reference systems. To account for that, we need reproject one layer to be the same as the other. TO reproject data we use the spTransform() function which takes 2 key arguments:

  1. a spatial lines, polygons or points object (that contains crs information) and
  2. a crs object which defines the CRS that we want to transform the data to.

NOTE: once again we are modifying the data. Think long and hard about what layers you want to reproject vs maintain the integrity of!

# epsg 3395 - global mercator
# reproject to something different
gulf_study_area_wgs84 <- spTransform(gulf_study_area,
                                     CRSobj = crs(us_states))

Your map should look like the one below

# plot data 
plot(coastlines_sim2, 
     main = "Coastlines with the NE state boundaries\n all data in the same CRS")
plot(us_states, 
     add = TRUE,
     col = "purple")
plot(gulf_study_area_wgs84, 
     add = TRUE,
     col = "red")

Map spatial extents

Our map is looking better. But we may want to zoom in to our study area. We also know that some of our data (ie the coastlines) are particularly complex makign for slower render times. Let’s crop our data to our study area extent to

  1. make a “zoomed in” map and to
  2. control the size of the data that we are working with by getting rid of data that we don’t need.

To crop the data - trimming or removing all of the data outside of our crop object, we will use the crop() function which takes 2 arguments

  1. the data that you wish to crop
  2. an extent object of an spatial object with an extent definition.
# crop the coastlines data to the spatial extent of the gulf study
# area extent
coastlines_crop <- crop(x = coastlines, 
                        y = gulf_study_area_wgs84)
plot(coastlines_crop)

Now - it’s your turn! Create a final study area map! Adjust the colors and layers to make it look nice. If you are more interested in learning ggplot, feel free to work with ggplot. If you’d like to use baseplot - feel free to use base.

Ggplot pros: templatted maps, easy to standardize, cleaner mapping code base plot pros: faster maps, more difficult

Tool Pros Cons
ggplot() templated maps, easy to standardize, clean mapping code, simple fast legends need to convert sp objects from readOGR() to a data frame
BASE R plot() faster mapping, supports sp objects natively legends are tedious to create and customize

===

Leaflet

Finally it is always nice to create interactive maps. This allows your colleagues to not only see but also interact with your data. Let’s use mapview() to create a quick interactive map.

To use mapview with multpile layers, you simply create a mapview object for each layer and then add them together to produce a final plot. Note that mapview will take care of spatial extent for you!

# create mapview map with multiple layers
m_coast <- mapview(coastlines_crop)
m_states <- mapview(us_states)

m_coast + m_states

Leaflet version!

# create a leaflet object
leaflet() %>%
  addTiles() %>% # add basemap to your map
  # then add a lines layer
  addPolylines(data = coastlines_crop, color = "#444444", weight = 1, smoothFactor = 0.5,
    opacity = 1.0) %>% 
  addPolygons(data = us_states, color = "#444444", fillColor = "green", weight = 1, smoothFactor = 0.5,
    opacity = 1.0)

Raster data in R

Next, let’s work with some raster data in R. We use the raster() package to open and manipulate raster data in R. This package has a large community around it and is a standard for most raster operations in R.

To load a raster layer with a single band - we use raster().

raster bands

raster bands

A raster layer can have one or depending on the format more than 1 band of information stored within it. Sometimes those bands are for images (see below) and will be RGB or in the case of a multi or hyperspectral remote sensing instrument - hundreds of bands across the light spectrum.

Sometimes those bands will be time series (for example climate data which we will work with tomorrow).

RGB raster bands

RGB raster bands

Below we will simply open a single band. The data format is .asc which is an ESRI format that is text based. ASC files contain a header where the key metadata are described.

ncols         2400
nrows         1560
xllcorner     -98.004166666667
yllcorner     18.004166666667
cellsize      0.0083333333333333
NODATA_value  -9999
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -2 0 0 0 0 0 0 -2 -2 -2 -2 -2 0 0 -4 0 -2 -2 -3 -2 -2 -3 -6 -5 -9 -12 -10 -7 -2 -2 -3 -3 -2 -4 -6 -8 -7 -8 -9 -9 -8 -9 -10 -10 -10 -11 -12 -12 -13 -12 -12 -12 -12 -13 -13 -13 -13 -14 -15 -14 -14 -14 -15 -14 -15 -14 -14 -15 -16 -16 -16 -16 -16 -16 -16 -17 -18 -19 -20 -20 -20 -20 -22 -22 -22 -22 -22 -21 -22 -23 -23 -23 -22 -23 -24 -25 -25 -25 -26 -26 -26 -26 -26 -25 -25 -24 -24 -24 -24 -24 -24 -23 -24 -25 -26 -26 -26 -27 -28 -27 -28 -28 -27 -27 -27 -27 -27 -28 -29 -29 -29 -29 -29 -28 -29 -30 -31 -31 -32 -32 -33 -33 -33 -34 -34 -34 -35 -36 -36 -36 -36 -36 -36 -36 -36 -36 -34 -36 -37 -37 -38 -38 -39 -39 -39

We can open an .asc layer using the raster() function. Note that this same process can be used with geotiffs and many other raster formats. The raster package is adept at figuring out what format of data you are providing it and using the correct drivers to open and read in the data!

The raster package also has a wrapper around the base plot() function allowing us to plot data using the same approach that we used above!

# load raster data in r
gulf_bathy <- raster("bathymetry/gom_bathy_srtm30plus_asc.asc")
plot(gulf_bathy)

Remove the box and axes.

# remove the box and axes from the plot
plot(gulf_bathy,
     box = FALSE,
     axes = FALSE)

Adjust the colors.

# remove the box and axes from the plot
plot(gulf_bathy,
     box = FALSE,
     axes = FALSE,
     col = grey(1:100/100),
     main = "grayscale bathymetric map")

Hillshade of the gulf coast

Plot the data .

# plot data - adjust the color to be grayscale
hillshade <- raster("hillshade/hillshade_3395.tif")
plot(hillshade,
     col = grey(1:100/100))

If you don’t want to crop your data - and simply want a quick plot of a particular spatial area, you can use the ext = argument with baseplot.

Below we specify the spatial extent of the gulf_study_area object that we used above to “zoom in” on just that area

plot(hillshade,
     main = "Bathymetric data for the gulf coast",
     col = grey(1:100/100),
     axes = FALSE,
     box = FALSE)

# add coastlines
plot(coastlines_crop, 
     add = TRUE)

when rasters don’t line up

we can use the projectRaster() function to reproject raster data in the same way we use spTransform() with vector data.

# reproject the raster data 
hillshade_wgs84 <- projectRaster(hillshade, 
                                 crs = crs(coastlines_crop))

plot(hillshade_wgs84)

We can crop raster data in the same way we crop vector data using the crop() function.

# Crop the data 
hillshade_wgs84 <- crop(hillshade_wgs84, gulf_study_area_wgs84)
plot(hillshade_wgs84)

Now let’s try to plot again - does it work?

plot(hillshade_wgs84,
     main = "Bathymetric data for the gulf coast",
     col = grey(1:100/100),
     axes = FALSE,
     box = FALSE)
# add coastlines
plot(coastlines_crop, 
     add = TRUE)

Challenge – Create a basemap of the gulf coast using the layers that we have been working with. On this map include

  1. This hillshade as a base map, cropped to cover a smaller area
  2. Coastline boundaries
  3. US State boundaries, cropped to cover a smaller area
  4. Bathemetric data

NOTE: use this time to create a basemap that your group can use. If you have other layers that you’d like to use - go for it!

When you create your map, be sure to

  1. CROP the hillshade for quicker plotting
  2. Experiment with the alpha argument to overlay a raster on top of a hillshade. Valid alpha values range between 0-1 (e.g. alpha = .5)
plot(hillshade_wgs84,
     main = "Bathymetric data for the gulf coast",
     col = grey(1:100/100),
     axes = FALSE,
     box = FALSE,
     ext = extent(gulf_study_area_wgs84),
     legend = FALSE)

# add coastlines
plot(coastlines_crop, 
     add = TRUE)
plot(gulf_bathy,
     add = TRUE,
     alpha = .5)
plot(us_states,
     add = TRUE)

Static basemaps in R

You can also create static basemaps quickly in R. Below we use ggmap() to create a basemap for a particular lat/long location.

# devtools::install_github("dkahle/ggmap")
library(ggplot2)
library(ggmap)

Let’s create a basemap!

https://earthdatascience.org/course-materials/earth-analytics/week-3/ggmap-basemap/

# get map
sq_map <- get_map(location = c(lon = -89.89, lat = 25.68),
                  maptype = "satellite",
                  source = "google", zoom = 5)

ggmap(sq_map)

Your turn – try a different maptype

Hint: use ??get_map to find different options

# get map
sq_map <- get_map(location = c(lon = -89.89, lat = 25.68),
                  maptype = "terrain",
                  source = "google", zoom=5)

ggmap(sq_map)
# get map
sq_map <- get_map(location = c(lon = -89.89, lat = 25.68),
                  maptype = "watercolor",
                  source = "stamen", zoom=5)

ggmap(sq_map)

Spatial queries of vector data using base R

# select all states that equal alabama 
texas <- us_states[us_states$NAME == "Texas",]
plot(texas)

Extra magic

Check out mapedit for vector data editing in R!

Check out mapedit for vector data editing in R!