February 27, 2014 | Alan
In analytics, run charts are used to identify and display trends in data over time. In this post we show how to use the AnalytiXagility platform to plot a basic run chart to visualise features of A&E attendance data over time.
Related Blog Posts
This post is one of a series that is intended to be read in sequence. The following posts provide a full description of:
- The Use of Run Charts in Health Informatics
- Visualising Features of A&E Waiting Time Data Using Run Charts
- Visualising Variant CPs due to changing targets of A and E waiting time data using run charts
The data we have used in this article has been sourced from NHS England: Weekly A&E SitReps 2013-14. This data has been cleaned up, converted to a CSV file, and then loaded into the AnalytiXagility platform. You can ask for a copy to be made available to your workspace. The data is called ae_england_20140105 in the AnalytiXagility workspace.
In this post we:
- Demonstrate techniques used in the construction of a layered plot, including formatting labels and annotating the plot using
- Explore the manipulation and consideration of date data using
Step 1 – Set up environment
Load the relevant libraries:
Import the data into R console workspace:
ae_data <- xap.read_table("ae_england_20140105")
head()function to have a quick look at a section of the imported data:
The imported dataset
ae_dataincludes percentage values of A&E attendees seen within 4 hours. The target, as set by NHS England, dictates that 95% of attendees are seen within this time. Let’s look first at fluctuations around the target over time; to do this, we first use the
class()function to examine data types of
##  "
##  "numeric"
This tells us that
POSIXdt, which is an S3 class for datetime representation.
Define date and metric variables:
run_date <- as.POSIXct(ae_data$period, format = "%d/%m/%Y") run_metric <- ae_data$percentagein4hoursorless_type1
To select a specific time period, use the
formatfunction to extract the year from
run_dateand create an index. We will look at A&E attendances in 2012 and 2013 so we apply the derived index to re-evaluate
idx <- format(run_date, "%Y") == "2013" | format(run_date, "%Y") == "2012" run_date <- run_date[idx] run_metric <- run_metric[idx]
As outlined in The Fundamentals of ggplot Explained,
ggplot2executes on data represented as a data frame. For plotting purposes, save the data as a data frame and name the columns accordingly:
to_plot <- data.frame(run_date, run_metric) colnames(to_plot) <- c("run_date", "run_metric")
Step 2 – Derive basic statistics
Calculate the CP (centre point) line, from the documentation. For a run chart, this is the median:
CP <- median(to_plot$run_metric, na.rm = TRUE)
Compute the upper and lower limits of this dataset:
std_dev <- sd(to_plot$run_metric, na.rm = TRUE) three_sd <- 3 * std_dev UL <- CP + three_sd LL <- CP - three_sd
The target for A&E attendees is 95%:
target <- 95
Step 3 – Ready, set, plot!
In the recent blog post The Fundamentals of ggplot Explained, we introduced the basic concepts of
ggplot2and layering. We can start constructing our plot by creating a
ggplot()object with labels and setting the colour scheme:
a <- ggplot() + ggtitle("Run Chart") + xlab("Time stamp") + ylab("Metric") + theme(panel.background = element_rect(fill = "white", colour = "black"))
Add some layers to
a. Feed the dataframe
geom_line(), and set aesthetic values
b <- a + geom_point(data = to_plot, aes(x = run_date, y = run_metric)) + geom_line(data = to_plot, aes(x = run_date, y = run_metric)) b
We can add features to the run chart by using
geom_hline(); for example, to add
targetinformation derived in Step 2 as layers to plot
c <- b + geom_hline(aes(yintercept = CP), colour = "red") + geom_hline(aes(yintercept = UL), colour = "black") + geom_hline(aes(yintercept = LL), colour = "black") + geom_hline(aes(yintercept = target), colour = "darkgreen") c
For more information:
- See Visualising Variant CPs due to changing targets of A and E waiting time data using run charts to learn how to apply varying targets to run chart data.
dateformat can be manipulated using functions from the
scalelibrary. We can vary the number of breaks on the x-axis and how it is visually represented using
d <- c + scale_x_datetime(breaks = date_breaks("months"), labels = date_format("%b-%Y")) d
dthere are overlapping x-axis labels. To remedy this, use
theme()to rotate text labels. In this case we have chosen 90 degrees:
e <- d + theme(axis.text.x = element_text(angle = 90, hjust = 1)) e
Step 4 – Annotate plot
We can add text to a plot using a
geom_text()layer, so let’s append the statistic values derived in Step 2 to plot
e. Each text group must have defined
yco-ordinates, and we can use the data to identify these.
UL_text <- paste("UL = ", round(UL, 2), sep = "") LL_text <- paste("LL = ", round(LL, 2), sep = "") CP_text <- paste("CP = ", CP, sep = "") target_text <- paste("target = ", target, sep = "") x_coord <- as.POSIXct("2012-02-01") f <- e + geom_text(aes(x = x_coord, y = UL - 0.5, label = UL_text), color = "black") + geom_text(aes(x = x_coord, y = LL - 0.5, label = LL_text), color = "black") + geom_text(aes(x = x_coord, y = target + 1.5, label = CP_text), color = "red") + geom_text(aes(x = x_coord, y = target + 2, label = target_text), color = "darkgreen") f
Step 5 – Tracing changes
Run charts are used extensively in analytics to track results from an implemented change. Let’s annotate our plot to reflect an implemented change and discuss the results. We first look at a single date, and then change over a period of time. To set a date to highlight on our plot and the text to label this, use:
date_implemented <- as.POSIXct("2012-08-01") text_implemanted <- "Change implemented"
geom_text()to add a line and text to represent a change implemented, setting the text to just below the
g <- f + geom_vline(aes(xintercept = as.numeric(date_implemented)), colour = "blue") + geom_text(aes(x = date_implemented + 120, y = UL - 2, label = text_implemanted), color = "blue") g
To highlight a region of change, draw a transparent rectangle using
geom_rect(), and feed in minimum and maximum
yvalues. Define the start and end dates:
start_date <- as.POSIXct("2013-06-01") end_date <- as.POSIXct("2013-07-01")
Finally we layer this onto plot:
h <- g + geom_rect(aes(alpha = "period", xmin = start_date, xmax = end_date, ymin = -Inf, ymax = Inf), fill = "green", colour = "green") h
We delve further in to the world of run charts in the next blog post in the series, by building an algorithm that captures special cases to identify trends in the data and writes this into a function.