# Check whether a package is installed and install if not.
# This will also load the package if already installed!
if(!require("sf")) install.packages("sf")
if(!require("tmap")) install.packages("tmap")
if(!require("dplyr")) install.packages("dplyr")
if(!require("COVID19")) install.packages("COVID19")
1 Objectives
- Import and review simple features data;
- Manipulate simple features data;
- Draw a static map;
- Draw an animated map.
2 Start a Script
For this lab or project, begin by:
- Starting a new
R
script - Create a good header section and table of contents
- Save the script file with an informative name
- set your working directory
Aim to make the script a future reference for doing things in R
!
3 Introduction
Many geographic information system (GIS) programmes make it extraordinarily easy for users to create maps by loading a file with a geographic identifier and data to plot. They are designed to be ‘point and click’, meaning that they do a lot of the hard work behind the scenes. We have to do this work ourselves in R
as it has no innate knowledge of what we want to graph, so we have to provide every detail. But this provides superior flexibility compared to ‘point and click’ programmes! Let’s take a look at making some basic maps using a dataset you’ve probably heard of before…
4 Packages and Data
For this lab we are going to use the official COVID-19 statistics from the dedicated package (COVID19
). Let’s first install and load the packages we need. We will be using sf
to handle simple features, tmap
for drawing maps, and dplyr
for manipulating data.
4.1 COVID-19 Data
We can get up-to-date global data for COVID-19:
# Download data - this may take a while!
= COVID19::covid19() covid_data
We have invested a lot of time and effort in creating COVID-19 Data
Hub, please cite the following when using it:
Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open
Source Software 5(51):2376, doi: 10.21105/joss.02376
The implementation details and the latest version of the data are
described in:
Guidotti, E., (2022), "A worldwide epidemiological database for
COVID-19 at fine-grained spatial resolution", Sci Data 9(1):112, doi:
10.1038/s41597-022-01245-1
To print citations in BibTeX format use:
> print(citation('COVID19'), bibtex=TRUE)
To hide this message use 'verbose = FALSE'.
# Print number of columns
ncol(covid_data)
[1] 47
# Print number of rows
nrow(covid_data)
[1] 281643
# Print variable names
names(covid_data)
[1] "id" "date"
[3] "confirmed" "deaths"
[5] "recovered" "tests"
[7] "vaccines" "people_vaccinated"
[9] "people_fully_vaccinated" "hosp"
[11] "icu" "vent"
[13] "school_closing" "workplace_closing"
[15] "cancel_events" "gatherings_restrictions"
[17] "transport_closing" "stay_home_restrictions"
[19] "internal_movement_restrictions" "international_movement_restrictions"
[21] "information_campaigns" "testing_policy"
[23] "contact_tracing" "facial_coverings"
[25] "vaccination_policy" "elderly_people_protection"
[27] "government_response_index" "stringency_index"
[29] "containment_health_index" "economic_support_index"
[31] "administrative_area_level" "administrative_area_level_1"
[33] "administrative_area_level_2" "administrative_area_level_3"
[35] "latitude" "longitude"
[37] "population" "iso_alpha_3"
[39] "iso_alpha_2" "iso_numeric"
[41] "iso_currency" "key_local"
[43] "key_google_mobility" "key_apple_mobility"
[45] "key_jhu_csse" "key_nuts"
[47] "key_gadm"
Looks like we have a lot of data here (281,639 observations of 47 variables!), so we will need to think carefully about what we want to display on a map!
4.2 Geographic Data
We will need geographic data alongside our COVID-19 data, otherwise we won’t be able to plot it! Luckily we can use rnaturalearth
for this:
if(!require("magrittr")) install.packages("magrittr") # for the pipe operator %>%
if(!require("devtools")) install.packages("devtools") # for installing packages from GitHub
# Due to some issues with the CRAN version of this package, it must be installed from GitHub using the following command
if(!require("rnaturalearth")) devtools::install_github("ropensci/rnaturalearth")
# Download geographic data as a shapefile
= rnaturalearth::ne_download(returnclass = "sf") world_rnatural
Reading layer `ne_110m_admin_0_countries' from data source
`C:\Users\00708040\AppData\Local\Temp\RtmpE7ga5L\ne_110m_admin_0_countries.shp'
using driver `ESRI Shapefile'
Simple feature collection with 177 features and 168 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: WGS 84
# Select key variables and assign them to a new object
= world_rnatural %>% # Pipe operator
world_iso select(NAME_LONG, ISO_A3_EH, POP_EST, GDP_MD, CONTINENT) # Select variables
To see what’s in the world_iso
object we can plot it, with the default setting in sf
showing the geographic distribution of each variable:
# Plot a subset of our data
plot(world_iso)
Our first maps! These variables are pretty self-explanatory, but a point to note is that ISO_A3_EH
is the 3 letter country code.
5 Simple Features
5.1 What are Simple Features?
Simple features are a way of representing geographic data in R
. They are a type of vector data, which means that they are made up of points, lines, or polygons. Points are used to represent single locations, lines are used to represent linear features such as roads, and polygons are used to represent areas such as countries. Simple features are stored in a data frame (shapefiles), with each row representing a single feature. The data frame has a special column called geometry
which contains the coordinates of the feature.
5.2 Simple Features in R
The sf
package is used to handle simple features in R
. It is a very powerful package, and we will only be using a small part of its functionality in this lab. The sf
package is designed to work with the dplyr
package, which is used for data manipulation. This means that we can use dplyr
functions to manipulate simple features data. Let’s take a look at the world_iso
object we created earlier:
# Print the first 6 rows of the data
head(world_iso)
Simple feature collection with 6 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
Geodetic CRS: WGS 84
NAME_LONG ISO_A3_EH POP_EST GDP_MD CONTINENT
1 Fiji FJI 889953 5496 Oceania
2 Tanzania TZA 58005463 63177 Africa
3 Western Sahara ESH 603253 907 Africa
4 Canada CAN 37589262 1736425 North America
5 United States USA 328239523 21433226 North America
6 Kazakhstan KAZ 18513930 181665 Asia
geometry
1 MULTIPOLYGON (((180 -16.067...
2 MULTIPOLYGON (((33.90371 -0...
3 MULTIPOLYGON (((-8.66559 27...
4 MULTIPOLYGON (((-122.84 49,...
5 MULTIPOLYGON (((-122.84 49,...
6 MULTIPOLYGON (((87.35997 49...
We can see that the geometry
column contains the coordinates of each country. It is important to note that the geometry
column is a special type of column called a sfc_GEOMETRY
column. This is a special type of column that contains the coordinates of the feature. The key features of sfc_GEOMETRY
columns are:
- Each row is a feature that we could plot; since we called
head()
we have only see the first 6 even though the full dataset has more; - We have 5 fields. Each column is a field with (potentially) useful information about the feature.The geometry column is not considered a field;
- We are told this is a collection of multipolygons;
- We are told the bounding box for our data (the most western/eastern longitudes and northern/southern latitudes);
- We are told the Coordinate Reference System (CRS), which in this case is “WGS 84.” CRSs are cartographers’ ways of telling each other what system they used for describing points on the earth. Cartographers need to pick an equation for an ellipsoid to approximate earth’s shape since it’s slightly pear-shaped. Cartographers also need to determine a set of reference markers–known as a datum–to use to set coordinates, as earth’s tectonic plates shift ever so slightly over time. Together, the ellipsoid and datum become a CRS. WGS 84 is one of the most common CRSs and is the standard used for GPS applications;
- Finally, we are provided a column called “geometry.” This column contains everything that
R
will need to draw each of the countries.
5.3 Simple Features Manipulation
We can use dplyr
to manipulate simple features data. Let’s say we want to select only the countries in Europe. We can do this using the filter()
function:
# Filter the data to only include countries in Europe
<- world_iso %>%
world_europe filter(CONTINENT == "Europe") # Filter the data
5.4 Transforming Simple Features
n issue with the result from a data visualisation perspective is that this unprojected visualisation distorts the world: countries such as Greenland at high latitudes appear bigger than the actually are. Let’s plot this to see what we mean:
# Plot world
plot(st_geometry(world_iso))
To overcome this issue we will project the our geographic data:
# Transform the map
= world_iso %>%
world_projected st_transform("+proj=moll") # Transform the map
# Plot transformed map
plot(st_geometry(world_projected)) # Plot the map
We can see that the map is now projected, and Greenland is no longer distorted. We can also see that the geometry
column is now a sfc_GEOMETRY
column, which means that we can plot it directly using plot()
.
6 Combining Data
Now we have our county geospatial data and our COVID-19 data - how in the world do we plot this? For this lab there are three simple steps to prepare your data for graphing.
- Import all data (already completed above);
- Clean the geospatial data (already completed above);
- Merge the data together.
# Merge the data using dplyr
= dplyr::left_join(world_projected, # Dataset 1
combined_df # Dataset 2
covid_data, by = c("ISO_A3_EH" = "iso_alpha_3")) # Variables
6.1 Calculating Area
The sf
package provides a wide range of functions for calculating geographic variables such as object centroid, bounding boxes, lengths and area. We can use this to calculate the population density of each country:
# Calculate country area (km)
$Area_km = as.numeric(st_area(combined_df)) / 1e6 # Divide by 1e6 to convert to km
combined_df
# Calculate population density
$`Pop/km2` = as.numeric(combined_df$POP_EST) / combined_df$Area_km combined_df
7 Plotting Data
7.1 sf
Plots
We can very easily graph our COVID-19 data for a single day using the sf
package’s plot()
function:
# Filter data for 24th May 2020
= combined_df %>%
data_yesterday filter(date == "2020-05-24") # Filter data
# Plot data
plot(data_yesterday, max.plot = 53)
Let’s just a look at a single variable, deaths, as there are too many above:
# Plot data for a single variable
plot(data_yesterday["deaths"])
# Improve the plot
plot(data_yesterday["deaths"], # Data
axes = TRUE, # Add axes
graticule = TRUE, # Add graticule
main = "COVID-19 Deaths on 24th May 2020") # Add title
We can also plot multiple variables on the same plot:
# Plot multiple variables
plot(data_yesterday[c("deaths", "tests")], # Data
axes = TRUE, # Add axes
graticule = TRUE) # Add graticule
7.2 tmap
Plots
The tmap
package is a very powerful package for plotting geographic data. It is designed to work with sf
objects, and provides a wide range of functions for plotting data. Let’s plot the COVID-19 data for a single day:
# Plot data
tm_shape(data_yesterday) + # Data
tm_polygons("deaths", # Variable
title = "Deaths", # Title
palette = "Reds", # Colour palette
border.col = "black") + # Border colour
tm_layout(legend.outside = TRUE, # Move legend outside of plot
legend.outside.position = "bottom", # Position legend at bottom
legend.frame = TRUE) # Add legend frame
You will notice that tmap
uses +
to add elements like ggplot2
. These maps are pretty ugly though, so let’s change the colour palette:
# Plot data
tm_shape(data_yesterday) + # Data
tm_polygons("deaths", # Variable
title = "Deaths", # Title
palette = "viridis", # Colour palette
border.col = "black") + # Border colour
tm_layout(legend.outside = TRUE, # Move legend outside of plot
legend.outside.position = "bottom", # Position legend at bottom
legend.frame = TRUE) # Add legend frame
The map can be further improved by adding graticules representing the curvature of the Earth, created as follows:
# Add graticules
= st_graticule(data_yesterday) data_yesterday.g
We can then plot the graticules and the data together while removing the plot border:
# Plot data
tm_shape(data_yesterday.g) + # Data
tm_lines(col = "grey") + # Add graticules
tm_shape(data_yesterday) + # Data
tm_polygons("deaths", # Variable
title = "Deaths", # Title
palette = "viridis", # Colour palette
border.col = "black") + # Border colour
tm_layout(legend.outside = TRUE, # Move legend outside of plot
legend.outside.position = "bottom", # Position legend at bottom
legend.frame = TRUE, frame = FALSE) # Add legend frame
7.3 ggplot2
Plots
The ggplot2
package is a very powerful package for plotting data. It can be used with sf
objects, and provides a wide range of functions for plotting data. Let’s plot the COVID-19 data for a single day:
# Load ggplot2
if (!require("ggplot2")) install.packages("ggplot2")
Loading required package: ggplot2
# Plot data
ggplot(data_yesterday) + # Data
geom_sf(aes(fill = deaths)) + # Variable
scale_fill_viridis_c(option = "C") + # Colour palette
theme_void() # Remove background
We can customise the plot by adding a title and changing the colour palette:
# Plot data
ggplot(data_yesterday) + # Data
geom_sf(aes(fill = deaths)) + # Variable
scale_fill_viridis_c(option = "A") + # Colour palette
theme_void() + # Remove background
labs(title = "COVID-19 Deaths on 24th May 2020") # Add title
7.4 Animated Maps
Animated graphics can communicate change effectively, and animated maps are useful for showing evolving geographic phenomena, such as the spread of COVID-19 worldwide. Animated maps can be created with tmap
by extending the tm_facet()
. Let’s start by creating a facetted map showing the total number of deaths on the first day of each month in our data:
# Create a new variable - date as a text string
$Date = as.character(combined_df$date)
combined_df
tm_shape(data_yesterday.g) +
# Graticule colour
tm_lines(col = "grey") +
# Specify object shape
tm_shape(data_yesterday) +
# Draw polygons
tm_polygons(
"Pop/km2",
palette = "viridis",
style = "log10_pretty",
n = 3
+
) # Select data from 1st every month
tm_shape(combined_df %>% filter(grepl(pattern = "01$", date))) +
# Draw dots
tm_dots(size = "deaths", col = "red") +
# Create a facet (i.e. plot for each month)
tm_facets("Date", nrow = 1, free.scales.fill = FALSE) +
# ALter legend
tm_layout(
legend.outside.position = "bottom",
legend.stack = "horizontal"
)
To turn this into an animated map we need to make some small changes to the code above:
# Install gifski
if(!require("gifski")) install.packages("gifski")
= tm_shape(data_yesterday.g) + # Graticule colour
m tm_lines(col = "grey") + # Specify object shape
tm_shape(data_yesterday) + # Draw polygons
tm_polygons(
"Pop/km2",
palette = "viridis",
style = "log10_pretty",
n = 3
+
) tm_shape(combined_df %>% filter(grepl(pattern = "01$", date))) + # Select data from 1st every month
tm_dots(size = "deaths", col = "red") + # Draw dots
tm_facets(along = "Date", free.coords = FALSE) + # Create a facet (i.e. plot for each month)
tm_layout(legend.outside = TRUE) # Alter legend
tmap_animation(m, "covid-19-animated-map-test.gif", width = 800) # Create animation
8 Activities
8.1 Download Data
Download the county-level data for the UK with this command:
= COVID19::covid19(country = "GBR", # Country code
uk_city level = 2) # Level of administrative division
8.2 Plot UK Data
Explore the geographic distribution of the outbreak using the lab code as a guide. Produce a static map of the UK infection on 2021-01-01.
💡 Click here to view a solution
# Download geographic data as a shapefile
= rnaturalearth::ne_download(returnclass = "sf")
world_rnatural
# Select key variables and assign them to a new object
= world_rnatural %>% # Data
uk_iso select(NAME_LONG, ISO_A3_EH, POP_EST, GDP_MD, CONTINENT) %>% # Variables
filter(ISO_A3_EH == "GBR") # Filter for UK
# Merge the geographic data with the COVID-19 data
= left_join(uk_iso, uk_city, # Data
combined_df by = c("ISO_A3_EH" = "iso_alpha_3")) # Variables
# Filter data for 1st Jan 2021
= combined_df %>% # Data
data_jan filter(date == "2021-01-01") # Date
# Plot data
ggplot(data_jan) + # Data
geom_sf(aes(fill = deaths)) + # Variable
scale_fill_viridis_c(option = "C") + # Colour palette
theme_void() # Remove background
8.3 Plot Europe Data
Explore the geographic distribution of the outbreak using the lab code as a guide. Produce a static map of the European infection on 2021-01-01.
💡 Click here to view a solution
# Download covid-19 data
= COVID19::covid19(continent = "Europe", # Country code
europe level = 2) # Level of administrative division
# Download geographic data as a shapefile
= rnaturalearth::ne_download(returnclass = "sf")
world_rnatural
# Select key variables and assign them to a new object
= world_rnatural %>%
europe_iso select(NAME_LONG, ISO_A3_EH, POP_EST, GDP_MD, CONTINENT) %>%
filter(CONTINENT == "Europe")
# Merge the geographic data with the COVID-19 data
= left_join(europe_iso, europe, by = c("ISO_A3_EH" = "iso_alpha_3"))
combined_df
# Filter data for 1st Jan 2021
= combined_df %>%
data_jan filter(date == "2021-01-01")
# Plot data
ggplot(data_jan) +
geom_sf(aes(fill = deaths)) +
scale_fill_viridis_c(option = "C") +
theme_void()
8.4 Animated Map
Produce an animated map of the UK infection from the start of the pandemic to the present day.
💡 Click here to view a solution
# Download geographic data as a shapefile
= rnaturalearth::ne_download(returnclass = "sf")
world_rnatural
# Select key variables and assign them to a new object
= world_rnatural %>%
uk_iso select(NAME_LONG, ISO_A3_EH, POP_EST, GDP_MD, CONTINENT) %>%
filter(ISO_A3_EH == "GBR")
# Merge the geographic data with the COVID-19 data
= left_join(uk_iso, uk_city, by = c("ISO_A3_EH" = "iso_alpha_3"))
combined_df
# Plot data
= tm_shape(combined_df) +
m tm_polygons(
"deaths",
palette = "viridis",
style = "log10_pretty",
n = 3
+
) tm_facets(along = "date", free.coords = FALSE) +
tm_layout(legend.outside = TRUE)
tmap_animation(m, "covid-19-animated-map-uk.gif", width = 800)
9 Recap
- Geographic data can be downloaded from the internet using the
rnaturalearth
package; - Geographic data can be merged with COVID-19 data using the
left_join()
function; - Geographic data can be plotted using the
ggplot2
package; - Geographic data can be plotted using the
tmap
package; - Animated maps can be created using the
tmap
package.