Mapping Hospital Admission Data to Scottish Datazone

June 6, 2014 | Alan

Datazones in Scotland are geographical areas used for producing small area statistics that allow comparison of economic, environmental, health and demographic data between zones. As part of the Future Cities Demonstrator, OPEN GLASGOW has curated a large set of open data which includes hospital admission data. Hospital admission data can be indicative of the health demographic per datazone. This post describes how to use the AnalytiXagility platform to interact with the Google Maps API and overlay a choropleth layer of hospital admission data for patients with respiratory disease in Glasgow.

Output

blog_resp

Data source

Future Cities is a project that aims to integrate city systems, enabling the linkage of urban data that will allow for exploration of new approaches to deliver a good local economy, excellent quality of life, and to promote better environmental awareness. OPEN GLASGOW is the product of a £24m investment to create a Future Cities Demonstrator, putting plans for integrating services into effect on a scale not seen before in the UK.

The dataset we have used in this example has been downloaded from OPEN GLASGOW, and contains information about counts and rate of patients admitted into Glasgow hospitals by data zone from 2002-2011:

http://data.glasgow.gov.uk/dataset/hospital-admissions-for-respiratory-disease-in-both-sexes

We have loaded this data into the AnalytiXagility platform, and named it respiratory-disease-in-both-sexes-2002-2011, with an accompanying metadata txt file respiratory-disease-in-both-sexes-2002-2011_metadata.txt

Visualising choropleth layers over maps in R requires the information to be layered contained within a shapefile. A shapefile spatially describes vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes.

Shapefiles from UK sources tend to use the Ordnance Survey National Grid (EPSG:27700) referencing system, whereas we need to use the World Geodetic System (EPSG:4326). To make life easy for you (and us!) we have already converted this shapefile and transformed it onto the appropriate reference system and loaded it into the AnalytiXagility platform as a data table:

scotland_datazones_shp

Learning outcomes

The packages explored in this post are:

  • ggmap
  • dplyr
  • reshape2
  • RgoogleMaps

Workflow

Step 1 – Set up environment

library(ggmap)
library(dplyr)
library(reshape2)
library(RgoogleMaps)

Import the scottish_datazones_shp table into the workspace:

sco_shp <- xap.read_table('scottish_datazones_shp')

Sort table by row column:

sco_shp <- sco_shp[with(sco_shp, order(row)), ]

Import the respiratory disease table:

respiratory_disease <- xap.read_table('respiratory-disease-both-sexes-2002-2011')

Step 2 – Data prep

Let’s examine the data to see what we’re dealing with:

head(sco_shp)
##   row   long  lat order  hole piece       group        id
## 1   1 -2.268 57.1     1 FALSE     1 S01000001.1 S01000001
## 2   2 -2.268 57.1     2 FALSE     1 S01000001.1 S01000001
## 3   3 -2.268 57.1     3 FALSE     1 S01000001.1 S01000001
## 4   4 -2.267 57.1     4 FALSE     1 S01000001.1 S01000001
## 5   5 -2.267 57.1     5 FALSE     1 S01000001.1 S01000001
## 6   6 -2.267 57.1     6 FALSE     1 S01000001.1 S01000001

sco_shp is a data frame that contains coordinate information for individual datazones. Datazones are defined in id, with the trace of the outline for each zone characterised in long and lat, following the order defined in order.

head(respiratory_disease)
##     geocode         zonename hsa69_02 hsa69_03 hsa69_04 hsa69_05 hsa69_06
## 1 S01003025 Carmunnock South        8        6        4        3        5
## 2 S01003026 Carmunnock South       13       13       17       19       26
## 3 S01003027     Darnley East        6       14       16        9       10
## 4 S01003028   Glenwood South       21       18       18       17       18
## 5 S01003029 Carmunnock South       17       15       20       20       12
## 6 S01003030   Glenwood South       13       12       12       14       17
##   hsa69_07 hsa69_08 hsa69_09 hsa69_10 hsa69_11 hsr69_02 hsr69_03 hsr69_04
## 1       12        9        8        5       10     1088      816      538
## 2       35       23       12       10       16     2041     1970     2644
## 3       12       12        9       19       16      518     1189     1341
## 4       29       29       37       44       40     2521     2158     2095
## 5       35       38       31       28       27     2273     1756     2105
## 6       23       16       21       18       31     1184     1098     1077
##   hsr69_05 hsr69_06 hsr69_07 hsr69_08 hsr69_09 hsr69_10 hsr69_11
## 1      422      676     1567     1110      970      602     1245
## 2     3668     4915     6629     3566     1794     1425     2219
## 3      780      876     1036     1052      819     1737     1476
## 4     1860     1901     3122     3176     4031     4762     4396
## 5     2105     1288     3302     3622     2884     2644     2574
## 6     1284     1561     2048     1415     1857     1572     2715

respiratory_disease is a data frame that contains datazone codes along with datazone names and count and rate information for hospital admissions for respiratory disease from 2002-2011. Column names beginning with hsa… are count values, and hsr... are rate (per 100,000) values. Column name ends in the year observed, e.g. hsa69_02 are count values gathered in 2002.

To generate a mappable choropleth layer, we must merge the sco_shp dataframe together with the respiratory_disease data frame. This will generate a data frame that contains the required co-ordinate outline information for each datazone along with the hospital admission activity.

Some merging functions in R can mess with the ordering of the resulting data frame, the left_join() function from the dplyr package retains the order, so we will use that. Apply left_join to a common column in sco_shp and respiratory_disease:

colnames(respiratory_disease)[1] <- "id"
resp_merge <- left_join(respiratory_disease, sco_shp, by = "id")

Step 3 – Get base map

The ggmap package allows for easy visualization of spatial data and models on top of Google Maps. Use the get_map() function to interact with the Google Maps API and download a map of Glasgow:

glaMap <- get_map(location = "glasgow", zoom = 13)
gla <- ggmap(glaMap, extent = "panel", maprange = FALSE)

Step 3 – Plot hospital admission rates for 2011 only

ggmap allows layering of data in the same way ggplot2 does. Use geom_polygon to define the aesthetic for the choropleth layer, and select hsr69_11 to visualise hospital admission rates for 2011:

p <- gla + 
    geom_polygon(data = resp_merge, aes(x = long, y = lat, group = group, fill = hsr69_11), alpha = 0.5) + ggtitle("Glasgow - Hospital Respiratory Disease Admission Rate n(per 100,000) by data zone") + 
    scale_fill_continuous(low = "white", high = "red") + coord_map(projection = "bonne", lat0 = 58) + 
    theme(plot.title = element_text(face = "bold", size = 14), legend.position = "none") + 
    geom_path(data = resp_merge, aes(long, lat, group = group, fill = hsr69_11, alpha = 0.3), colour ="grey", lwd = 0.05)
plot(p)

plot of chunk unnamed-chunk-9

Step 4 – Plot hospital admission rates for all years and compare

Vectorise the columns containing counts and rates into simply variable and value by using the melt() function from the reshape2 package:

resp_melt <- melt(resp_merge, id.vars = c("id", "zonename", "order", "order", "row", "long", "hole", "lat", "group", "piece"))

For now, we are only interested in rates of hospital admissions, use the grep() function to select what we need:

resp_melt_rate <- resp_melt[grep("hsr", resp_melt$variable), ]

As before, use geom_polygon to define the aesthetic for the choropleth layer and append a facet layer for variable:

q <- gla + 
geom_polygon(data = resp_melt_rate, aes(x = long, y = lat, group = group, fill = value), alpha = 0.8) + facet_wrap(~variable) + 
ggtitle("Glasgow - Hospital Respiratory Disease Admission rate (per 100,000)n by data zone 2002-2011") + scale_fill_continuous(low = "white", high = "red") + 
    coord_map(projection = "bonne", lat0 = 58) + 
    theme(plot.title = element_text(face = "bold", size = 14))
plot(q)

plot of chunk unnamed-chunk-12

Further reading

http://futurecity.glasgow.gov.uk/

http://cran.r-project.org/web/packages/ggmap/index.html

http://cran.r-project.org/web/packages/dplyr/index.html

http://blog.rstudio.org/2014/01/17/introducing-dplyr/

http://cran.r-project.org/web/packages/reshape2/index.html

http://cran.r-project.org/web/packages/RgoogleMaps/index.html

http://www.ordnancesurvey.co.uk/

http://spatial.ly/2013/12/introduction-spatial-data-ggplot2/

 


 

Leave a Reply

Your email address will not be published. Required fields are marked *