In this lesson we will learn how to perform some basic cleaning and plotting of spatial data in R.
At the end of this 30 minute overview you will be able to: 1. Open a vector data layer in R using readOGR()
1. Open a raster data layer in R using raster()
1. Create basic maps using ggplot()
1. Reproject and crop raster and vector data
Intro to vector data in R - Earth Data Science website
There are many ways to import and map vector data in R.
To read the data, you have several options
sp
: Import shapefiles and other data using readOGR()
from the sp
packagesp
: more recently the sf
package has proved to be both faster and more efficient that sp
# unzip data
#library(utils)
setwd("~/Documents/data/oss-institute")
#setwd("~/Documents/github/oss-lessons/spatial-data-gis-law")
library(rgdal)
library(raster)
library(ggplot2)
library(rgeos)
library(mapview)
library(leaflet)
library(broom) # if you plot with ggplot and need to turn sp data into dataframes
options(stringsAsFactors = FALSE)
First, let’s download some data from natural earth.
# download the data
download.file("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_coastline.zip",
destfile = 'coastlines.zip')
# unzip the file
unzip(zipfile = "coastlines.zip",
exdir = 'ne-coastlines-10m')
Next, we can open the data using readOGR
from the sp (spatial) package.
# load the data
coastlines <- readOGR("ne-coastlines-10m/ne_10m_coastline.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "ne-coastlines-10m/ne_10m_coastline.shp", layer: "ne_10m_coastline"
## with 4132 features
## It has 2 fields
Are the data points, lines of polygons?
CHALLENGE – Looking at the data, what are the 2 possible vector data structures that this data could be stored in?
# view spatial attributes
class(coastlines)
## [1] "SpatialLinesDataFrame"
## attr(,"package")
## [1] "sp"
extent(coastlines)
## class : Extent
## xmin : -180
## xmax : 180
## ymin : -85.22194
## ymax : 83.6341
crs(coastlines)
## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
Super speedy quick plot with R baseplot … or not. Be patient - this object has a lot of complex features
plot(coastlines,
main = "Global Coastlines")
This particular layer is complex. There are many details in the boundaries as rendered that we may want if we zoom in but may not need to produce a global map. let’s Simplify it. The gSimplify
function is a part of the rgeos package. The simplify function removes vertices from complex lines. Remember that a line is composed of vertices. A circle is simply a line with lots of vertices - the more vertices it has, the more ‘round’ the line appears.
As you use this function keep in mind that you are modifying your data. You probably don’t want to do this if you are performing any sort of quantitative analysis on the data but you definitely want to do this if you are creating online maps and other visual products from your data.
The gSimplify
function takes 3 arguments
# simplify geometry
coastlines_simp <- gSimplify(coastlines,
tol = 3,
topologyPreserve = TRUE)
plot(coastlines_simp,
main = "map with boundaries simplified")
Notice that here the map plots faster, but now it looks blocky. We may have simplified TOO MUCH. let’s reduce the tol =
argument value to .1.
# simplify with a lower tolerance value (keeping more detail)
coastlines_sim2 <- gSimplify(coastlines,
tol = .1,
topologyPreserve = TRUE)
plot(coastlines_sim2,
main = "Map of coastlines - simplified geometry\n tol=.1")
That’s better. We now have enough detail for plotting purposes but have increased speed dramatically. These types of steps become important when creating online interactive maps to optimize speed.
IMPORTANT: when you modify the geometry you are also modifying the data - in this case any calculated perimeter or area values using these data will be compromised.
Less speedy plot with ggplot – but it looks so nice! NOTE: ggplot throws an error if you don’t include the data =
argument for some reason on your geom_ element. Be sure to always expliceltly include this.
# turn the data into a spatial data frame
coastlines_sim2_df <- SpatialLinesDataFrame(coastlines_sim2,
coastlines@data)
#tidy(coastlines_sim2_df)
# plot the data
ggplot() +
geom_path(data = coastlines_sim2_df, aes(x = long, y = lat, group = group)) +
labs(title = "Global Coastlines - using ggplot")
coord_fixed()
is your best friend – it will make the map x and y Also notice i’m using labs to add a title and x and y axis labels. Cool stuff.
ggplot() +
geom_path(data = coastlines_sim2_df, aes(x = long, y = lat, group = group)) +
coord_fixed() +
labs(title = "My awesome ggplot map of coastlines",
subtitle = "my awesome subtitle",
x = "", y = "") # cause we don't need x and y labels do we?
Make it interactive using mapview. Mapview is just a wrapper for leaflet. is a powerful javascript based tool that can be used to create interactive maps - like google maps. The mapview
library in R
is a wrapper around leaflet that allows you to quickly create interactive maps.
COOL TIP 1: note that when you knit to html your interactive map will be embedded in your html file! This means you can send your friends (or yourself! :) ) and interactive map! COOL TIP 2: you can also publish your interactive map to rpubs! Bryce will talk more about this later today.
# create leaflet
mapview(coastlines_sim2)
# create a leaflet object
leaflet(coastlines_sim2) %>%
addTiles() %>% # add basemap to your map
# then add a lines layer
addPolylines(color = "#444444", weight = 1, smoothFactor = 0.5,
opacity = 1.0)
This is all great but – a bit slow. Sf will be the wave of the future. It’s much faster than sp however isn’t fully supported across all tools and projects just yet… it will be! Below is just a quick example of how it works. We won’t use it for today’s lesson…
library(sf)
# import the data - sf is much faster
coastlines_sf <- st_read("ne-coastlines-10m/ne_10m_coastline.shp")
## Reading layer `ne_10m_coastline' from data source `/Users/lewa8222/Documents/data/oss-institute/ne-coastlines-10m/ne_10m_coastline.shp' using driver `ESRI Shapefile'
## Simple feature collection with 4132 features and 2 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -180 ymin: -85.22194 xmax: 180 ymax: 83.6341
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
# plotting is still a bit slow
plot(coastlines_sf[2])
SF
will have full ggplot support in the future but for now, you can only get to it by installing ggplot from the github dev branch. For this reason we won’t use sf in this workshop but know that it will be more popular in the upcoming years.
# this only works if you install ggplot from github
#devtools::install_github("tidyverse/ggplot2")
#library(ggplot2)
# the dev version is not currently installing properly
ggplot() +
geom_sf(data=coastlines_sf, aes(fill = featurecla))
Next, let’s import another data layer and plot it on top of our coastlines.
us_states <- readOGR("us-boundaries/us_bound_pop.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "us-boundaries/us_bound_pop.shp", layer: "us_bound_pop"
## with 58 features
## It has 14 fields
us_states
## class : SpatialPolygonsDataFrame
## features : 58
## extent : -124.7258, -66.94989, 24.49813, 49.38436 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 14
## names : AFFGEOI, STATEFP, STATENS, GEOID, STUSPS, NAME, LSAD, ALAND, AWATER, region, GEO_id2, GEO_ds_, POPGROUP_d, HD01_VD
## min values : 0400000US01, 01, 00068085, 01, AL, Alabama, 00, 102262419204, 1026252314, Midwest, 01, Alabama, 001, 10006693
## max values : 0400000US56, 56, 01785534, 56, WY, Wyoming, 00, 92790411854, 8504608966, West, 56, Wyoming, 001, 9900571
Use add = TRUE to plot us_states on top of
plot(coastlines_sim2,
main = "Coastlines with the NE state boundaries")
plot(us_states,
add = TRUE,
col = "purple")
CHALLENGE - Next, create a map with the following layers
Note that you will need to first import the study area layer and then add it to your map.
## OGR data source with driver: ESRI Shapefile
## Source: "study-area/study-area-merc.shp", layer: "study-area-merc"
## with 1 features
## It has 1 fields
## Integer64 fields read as strings: id
We’ve plotted each of them previously so we know the data are OK. How does it go?
Our data are in different coordinate reference systems. To account for that, we need reproject one layer to be the same as the other. TO reproject data we use the spTransform()
function which takes 2 key arguments:
NOTE: once again we are modifying the data. Think long and hard about what layers you want to reproject vs maintain the integrity of!
# epsg 3395 - global mercator
# reproject to something different
gulf_study_area_wgs84 <- spTransform(gulf_study_area,
CRSobj = crs(us_states))
Your map should look like the one below
# plot data
plot(coastlines_sim2,
main = "Coastlines with the NE state boundaries\n all data in the same CRS")
plot(us_states,
add = TRUE,
col = "purple")
plot(gulf_study_area_wgs84,
add = TRUE,
col = "red")
Our map is looking better. But we may want to zoom in to our study area. We also know that some of our data (ie the coastlines) are particularly complex makign for slower render times. Let’s crop our data to our study area extent to
To crop the data - trimming or removing all of the data outside of our crop object, we will use the crop()
function which takes 2 arguments
# crop the coastlines data to the spatial extent of the gulf study
# area extent
coastlines_crop <- crop(x = coastlines,
y = gulf_study_area_wgs84)
plot(coastlines_crop)
Now - it’s your turn! Create a final study area map! Adjust the colors and layers to make it look nice. If you are more interested in learning ggplot, feel free to work with ggplot. If you’d like to use baseplot - feel free to use base.
Ggplot pros: templatted maps, easy to standardize, cleaner mapping code base plot pros: faster maps, more difficult
Tool | Pros | Cons | |
---|---|---|---|
ggplot() | templated maps, easy to standardize, clean mapping code, simple fast legends | need to convert sp objects from readOGR() to a data frame |
|
BASE R plot() | faster mapping, supports sp objects natively | legends are tedious to create and customize | |
===
Finally it is always nice to create interactive maps. This allows your colleagues to not only see but also interact with your data. Let’s use mapview() to create a quick interactive map.
To use mapview with multpile layers, you simply create a mapview object for each layer and then add them together to produce a final plot. Note that mapview will take care of spatial extent for you!
# create mapview map with multiple layers
m_coast <- mapview(coastlines_crop)
m_states <- mapview(us_states)
m_coast + m_states
# create a leaflet object
leaflet() %>%
addTiles() %>% # add basemap to your map
# then add a lines layer
addPolylines(data = coastlines_crop, color = "#444444", weight = 1, smoothFactor = 0.5,
opacity = 1.0) %>%
addPolygons(data = us_states, color = "#444444", fillColor = "green", weight = 1, smoothFactor = 0.5,
opacity = 1.0)
Next, let’s work with some raster data in R
. We use the raster()
package to open and manipulate raster data in R
. This package has a large community around it and is a standard for most raster operations in R
.
To load a raster layer with a single band - we use raster()
.
A raster layer can have one or depending on the format more than 1 band of information stored within it. Sometimes those bands are for images (see below) and will be RGB or in the case of a multi or hyperspectral remote sensing instrument - hundreds of bands across the light spectrum.
Sometimes those bands will be time series (for example climate data which we will work with tomorrow).
Below we will simply open a single band. The data format is .asc which is an ESRI format that is text based. ASC files contain a header where the key metadata are described.
ncols 2400
nrows 1560
xllcorner -98.004166666667
yllcorner 18.004166666667
cellsize 0.0083333333333333
NODATA_value -9999
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -2 0 0 0 0 0 0 -2 -2 -2 -2 -2 0 0 -4 0 -2 -2 -3 -2 -2 -3 -6 -5 -9 -12 -10 -7 -2 -2 -3 -3 -2 -4 -6 -8 -7 -8 -9 -9 -8 -9 -10 -10 -10 -11 -12 -12 -13 -12 -12 -12 -12 -13 -13 -13 -13 -14 -15 -14 -14 -14 -15 -14 -15 -14 -14 -15 -16 -16 -16 -16 -16 -16 -16 -17 -18 -19 -20 -20 -20 -20 -22 -22 -22 -22 -22 -21 -22 -23 -23 -23 -22 -23 -24 -25 -25 -25 -26 -26 -26 -26 -26 -25 -25 -24 -24 -24 -24 -24 -24 -23 -24 -25 -26 -26 -26 -27 -28 -27 -28 -28 -27 -27 -27 -27 -27 -28 -29 -29 -29 -29 -29 -28 -29 -30 -31 -31 -32 -32 -33 -33 -33 -34 -34 -34 -35 -36 -36 -36 -36 -36 -36 -36 -36 -36 -34 -36 -37 -37 -38 -38 -39 -39 -39
We can open an .asc layer using the raster() function. Note that this same process can be used with geotiffs and many other raster formats. The raster package is adept at figuring out what format of data you are providing it and using the correct drivers to open and read in the data!
The raster package also has a wrapper around the base plot() function allowing us to plot data using the same approach that we used above!
# load raster data in r
gulf_bathy <- raster("bathymetry/gom_bathy_srtm30plus_asc.asc")
plot(gulf_bathy)
Remove the box and axes.
# remove the box and axes from the plot
plot(gulf_bathy,
box = FALSE,
axes = FALSE)
Adjust the colors.
# remove the box and axes from the plot
plot(gulf_bathy,
box = FALSE,
axes = FALSE,
col = grey(1:100/100),
main = "grayscale bathymetric map")
Plot the data .
# plot data - adjust the color to be grayscale
hillshade <- raster("hillshade/hillshade_3395.tif")
plot(hillshade,
col = grey(1:100/100))
If you don’t want to crop your data - and simply want a quick plot of a particular spatial area, you can use the ext =
argument with baseplot.
Below we specify the spatial extent of the gulf_study_area object that we used above to “zoom in” on just that area
plot(hillshade,
main = "Bathymetric data for the gulf coast",
col = grey(1:100/100),
axes = FALSE,
box = FALSE)
# add coastlines
plot(coastlines_crop,
add = TRUE)
we can use the projectRaster() function to reproject raster data in the same way we use spTransform() with vector data.
# reproject the raster data
hillshade_wgs84 <- projectRaster(hillshade,
crs = crs(coastlines_crop))
plot(hillshade_wgs84)
We can crop raster data in the same way we crop vector data using the crop() function.
# Crop the data
hillshade_wgs84 <- crop(hillshade_wgs84, gulf_study_area_wgs84)
plot(hillshade_wgs84)
Now let’s try to plot again - does it work?
plot(hillshade_wgs84,
main = "Bathymetric data for the gulf coast",
col = grey(1:100/100),
axes = FALSE,
box = FALSE)
# add coastlines
plot(coastlines_crop,
add = TRUE)
Challenge – Create a basemap of the gulf coast using the layers that we have been working with. On this map include
NOTE: use this time to create a basemap that your group can use. If you have other layers that you’d like to use - go for it!
When you create your map, be sure to
alpha = .5
)plot(hillshade_wgs84,
main = "Bathymetric data for the gulf coast",
col = grey(1:100/100),
axes = FALSE,
box = FALSE,
ext = extent(gulf_study_area_wgs84),
legend = FALSE)
# add coastlines
plot(coastlines_crop,
add = TRUE)
plot(gulf_bathy,
add = TRUE,
alpha = .5)
plot(us_states,
add = TRUE)
You can also create static basemaps quickly in R. Below we use ggmap() to create a basemap for a particular lat/long location.
# devtools::install_github("dkahle/ggmap")
library(ggplot2)
library(ggmap)
Let’s create a basemap!
https://earthdatascience.org/course-materials/earth-analytics/week-3/ggmap-basemap/
# get map
sq_map <- get_map(location = c(lon = -89.89, lat = 25.68),
maptype = "satellite",
source = "google", zoom = 5)
ggmap(sq_map)
Your turn – try a different maptype
Hint: use ??get_map to find different options
# get map
sq_map <- get_map(location = c(lon = -89.89, lat = 25.68),
maptype = "terrain",
source = "google", zoom=5)
ggmap(sq_map)
# get map
sq_map <- get_map(location = c(lon = -89.89, lat = 25.68),
maptype = "watercolor",
source = "stamen", zoom=5)
ggmap(sq_map)
# select all states that equal alabama
texas <- us_states[us_states$NAME == "Texas",]
plot(texas)