June 6, 2014 | Alan
Datazones in Scotland are geographical areas used for producing small area statistics that allow comparison of economic, environmental, health and demographic data between zones. As part of the Future Cities Demonstrator, OPEN GLASGOW has curated a large set of open data which includes hospital admission data. Hospital admission data can be indicative of the health demographic per datazone. This post describes how to use the AnalytiXagility platform to interact with the Google Maps API and overlay a choropleth layer of hospital admission data for patients with respiratory disease in Glasgow.
Output
Data source
Future Cities is a project that aims to integrate city systems, enabling the linkage of urban data that will allow for exploration of new approaches to deliver a good local economy, excellent quality of life, and to promote better environmental awareness. OPEN GLASGOW is the product of a £24m investment to create a Future Cities Demonstrator, putting plans for integrating services into effect on a scale not seen before in the UK.
The dataset we have used in this example has been downloaded from OPEN GLASGOW, and contains information about counts and rate of patients admitted into Glasgow hospitals by data zone from 2002-2011:
We have loaded this data into the AnalytiXagility platform, and named it respiratory-disease-in-both-sexes-2002-2011, with an accompanying metadata txt file respiratory-disease-in-both-sexes-2002-2011_metadata.txt
Visualising choropleth layers over maps in R requires the information to be layered contained within a shapefile. A shapefile spatially describes vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes.
Shapefiles from UK sources tend to use the Ordnance Survey National Grid (EPSG:27700) referencing system, whereas we need to use the World Geodetic System (EPSG:4326). To make life easy for you (and us!) we have already converted this shapefile and transformed it onto the appropriate reference system and loaded it into the AnalytiXagility platform as a data table:
scotland_datazones_shp
Learning outcomes
The packages explored in this post are:
ggmap
dplyr
reshape2
RgoogleMaps
Workflow
Step 1 – Set up environment
library(ggmap)
library(dplyr)
library(reshape2)library(RgoogleMaps)
Import the scottish_datazones_shp
table into the workspace:
sco_shp <- xap.read_table('scottish_datazones_shp')
Sort table by row
column:
sco_shp <- sco_shp[with(sco_shp, order(row)), ]
Import the respiratory disease table:
respiratory_disease <- xap.read_table('respiratory-disease-both-sexes-2002-2011')
Step 2 – Data prep
Let’s examine the data to see what we’re dealing with:
head(sco_shp)
## row long lat order hole piece group id
## 1 1 -2.268 57.1 1 FALSE 1 S01000001.1 S01000001
## 2 2 -2.268 57.1 2 FALSE 1 S01000001.1 S01000001
## 3 3 -2.268 57.1 3 FALSE 1 S01000001.1 S01000001
## 4 4 -2.267 57.1 4 FALSE 1 S01000001.1 S01000001
## 5 5 -2.267 57.1 5 FALSE 1 S01000001.1 S01000001
## 6 6 -2.267 57.1 6 FALSE 1 S01000001.1 S01000001
sco_shp
is a data frame that contains coordinate information for individual datazones. Datazones are defined in id
, with the trace of the outline for each zone characterised in long
and lat
, following the order defined in order
.
head(respiratory_disease)
## geocode zonename hsa69_02 hsa69_03 hsa69_04 hsa69_05 hsa69_06
## 1 S01003025 Carmunnock South 8 6 4 3 5
## 2 S01003026 Carmunnock South 13 13 17 19 26
## 3 S01003027 Darnley East 6 14 16 9 10
## 4 S01003028 Glenwood South 21 18 18 17 18
## 5 S01003029 Carmunnock South 17 15 20 20 12
## 6 S01003030 Glenwood South 13 12 12 14 17
## hsa69_07 hsa69_08 hsa69_09 hsa69_10 hsa69_11 hsr69_02 hsr69_03 hsr69_04
## 1 12 9 8 5 10 1088 816 538
## 2 35 23 12 10 16 2041 1970 2644
## 3 12 12 9 19 16 518 1189 1341
## 4 29 29 37 44 40 2521 2158 2095
## 5 35 38 31 28 27 2273 1756 2105
## 6 23 16 21 18 31 1184 1098 1077
## hsr69_05 hsr69_06 hsr69_07 hsr69_08 hsr69_09 hsr69_10 hsr69_11
## 1 422 676 1567 1110 970 602 1245
## 2 3668 4915 6629 3566 1794 1425 2219
## 3 780 876 1036 1052 819 1737 1476
## 4 1860 1901 3122 3176 4031 4762 4396
## 5 2105 1288 3302 3622 2884 2644 2574
## 6 1284 1561 2048 1415 1857 1572 2715
respiratory_disease
is a data frame that contains datazone codes along with datazone names and count and rate information for hospital admissions for respiratory disease from 2002-2011. Column names beginning with hsa
… are count values, and hsr...
are rate (per 100,000) values. Column name ends in the year observed, e.g. hsa69_02
are count values gathered in 2002.
To generate a mappable choropleth layer, we must merge the sco_shp
dataframe together with the respiratory_disease
data frame. This will generate a data frame that contains the required co-ordinate outline information for each datazone along with the hospital admission activity.
Some merging functions in R can mess with the ordering of the resulting data frame, the left_join()
function from the dplyr
package retains the order, so we will use that. Apply left_join
to a common column in sco_shp
and respiratory_disease
:
colnames(respiratory_disease)[1] <- "id"
resp_merge <- left_join(respiratory_disease, sco_shp, by = "id")
Step 3 – Get base map
The ggmap
package allows for easy visualization of spatial data and models on top of Google Maps. Use the get_map()
function to interact with the Google Maps API and download a map of Glasgow:
glaMap <- get_map(location = "glasgow", zoom = 13)
gla <- ggmap(glaMap, extent = "panel", maprange = FALSE)
Step 3 – Plot hospital admission rates for 2011 only
ggmap
allows layering of data in the same way ggplot2
does. Use geom_polygon
to define the aesthetic for the choropleth layer, and select hsr69_11
to visualise hospital admission rates for 2011:
p <- gla +
geom_polygon(data = resp_merge, aes(x = long, y = lat, group = group, fill = hsr69_11), alpha = 0.5) + ggtitle("Glasgow - Hospital Respiratory Disease Admission Rate n(per 100,000) by data zone") +
scale_fill_continuous(low = "white", high = "red") + coord_map(projection = "bonne", lat0 = 58) +
theme(plot.title = element_text(face = "bold", size = 14), legend.position = "none") +
geom_path(data = resp_merge, aes(long, lat, group = group, fill = hsr69_11, alpha = 0.3), colour ="grey", lwd = 0.05)
plot(p)
Step 4 – Plot hospital admission rates for all years and compare
Vectorise the columns containing counts and rates into simply variable
and value
by using the melt()
function from the reshape2
package:
resp_melt <- melt(resp_merge, id.vars = c("id", "zonename", "order", "order", "row", "long", "hole", "lat", "group", "piece"))
For now, we are only interested in rates of hospital admissions, use the grep()
function to select what we need:
resp_melt_rate <- resp_melt[grep("hsr", resp_melt$variable), ]
As before, use geom_polygon
to define the aesthetic for the choropleth layer and append a facet layer for variable
:
q <- gla +
geom_polygon(data = resp_melt_rate, aes(x = long, y = lat, group = group, fill = value), alpha = 0.8) + facet_wrap(~variable) +
ggtitle("Glasgow - Hospital Respiratory Disease Admission rate (per 100,000)n by data zone 2002-2011") + scale_fill_continuous(low = "white", high = "red") +
coord_map(projection = "bonne", lat0 = 58) +
theme(plot.title = element_text(face = "bold", size = 14))
plot(q)
Further reading
http://futurecity.glasgow.gov.uk/
http://cran.r-project.org/web/packages/ggmap/index.html
http://cran.r-project.org/web/packages/dplyr/index.html
http://blog.rstudio.org/2014/01/17/introducing-dplyr/
http://cran.r-project.org/web/packages/reshape2/index.html
http://cran.r-project.org/web/packages/RgoogleMaps/index.html
http://www.ordnancesurvey.co.uk/
http://spatial.ly/2013/12/introduction-spatial-data-ggplot2/