Reproducibility of Clinical Trials: Part 2

August 17, 2020 | Anna

In the first part of this blog post, we focused on the importance of reproducibility in science and the problems we found when trying to reproduce a Randomised Controlled Trial (RCT). In the second part we will dive into the steps we followed to reproduce the analysis in the Aridhia DRE Workspace.

Randomised Controlled Trials (RCTs) are studies performed in humans to assess whether a new approach is safe and effective before applying it in healthcare. To reduce the sources of bias, in RCTs the subjects are randomly allocated into two or more groups; the treatment group will receive the intervention under assessment, and the other will act as the control, thus, it will receive a placebo or the best current treatment for the condition. In addition, RCTs may be blinded. This means that the participants, physicians, and investigators are unaware of the group allocations of the subjects.

The data we found available was from the Mercaptopurine versus placebo to prevent recurrence of Crohn’s disease after surgical resection: a multicentre, double-blind, randomised controlled trial, also known as the TOPPIC study. This study is an RCT performed by the Clinical Trials Unit of the University of Edinburghin 2016; it aimed to assess if Mercaptopurine, compared to a placebo, could prevent or delay the postoperative clinical recurrence of Crohn’s Disease, a chronic illness that can involve any segment of the gastrointestinal tract, causing a great amount of pain and a continuous need for surgical interventions.

Because all the trial information was separated into 31 different .csv files, the first step was to filter and unite all the data into one single file. Thus, we wrote an R script that generated a single .csv file containing all the information about the subjects’baseline characteristics, such as age, gender, treatment allocation, and Crohn’s medical history. No information about the recurrence/outcome was added at that stage.

Within the Aridhia DRE Workspace, we easily converted the resulting .csv file into a dataset, before using in-Workspace functionality to add metadata describing the dataset and all the variables in it. This way, any person with access to it could understand the information contained in the dataset and perform their desired analyses. Then, we performed a preliminary data analysis using the Workspace’sbuilt-in tools.

Once we knew the baseline characteristics of the population, we got started on the statistical analysis. The TOPPIC study follows the same steps as any Survival Analysis. Therefore, the statistical test used to examine results was a Cox Proportional Hazards Model, through which a Hazards Ratio (HR) is obtained (“hazard” refers to the probability that an individual has an event at a certain time); if the resulting HR is less than one, the results will favor the experimental group.

All the analyses were performed in R, using mainly the Survminer package: here’s a cheat-sheet for this package to help anyone interested. To perform a Survival Analysis, two basic elements are needed. First, a variable describing whether the subject had the outcome or not, normally coded as 1/0 meaning Yes/No, respectively. Then, a variable describing the time of the outcome in those subjects who suffered recurrence, and the censored time in those subjects that did not suffer recurrence throughout their participation in the trial. The censored times were the most difficult thing to get, as none of the existing variables in the dataset contained that precise information; we decided to use the last visit date and the status change date to build the censored times.

Although we wrote R scripts to perform the statistical analysis, we also developed a Shiny application, taking advantage of their integration within Aridhia’s DRE. The Shiny app performs aninteractive survival analysis,enablingaquick visualization of the results. This particular app allows the user to rapidly build a baseline characteristics table, Kaplan Meier graphs, and Cox models; moreover, it allows the analysis of subgroups by adding a filter on the data.

In the first part of this blog, we published our results compared to the original paper. The results were very similar, but not identical. The differences were a consequence of missing information about the original methods followed:in particular, those that defined the censored times for each subject. However, after finishing this project,some members of the research group that performed the original analysis, replied to us explainingthe exact methods they used to define the censored times. When we changed our code to use the same methods to perform the analysis, the results did not drastically change from the first attempt. Notice that the unadjusted HR is more similar after the correction, but the adjusted HR is not; this could be due to anonymization changes in the variables used to adjust the model. Overall, this was a very positive experience that allowed us to work with real clinical data, experiencing the challenges that this entails.

Adjusted HR

Unadjusted HR

Original Study

0.54 (0.27 – 1.06), p = 0.07

0.53 (0.28 – 0.99), p = 0.046


0.54 (0.27 – 1.10), p = 0.09

0.53 (0.28 – 1), p = 0.05

Replication Corrected

0.54 (0.27 – 1.11), p = 0.09

0.53 (0.28 – 0.99), p = 0.05

We would like to thank the chief investigator Professor Satsangi and the statisticians Dr Steff Lewis and Catriona Keerie for kindly replying to our emails and allowing a faithful reproduction of the trial.



Data Science intern at Aridhia. A first-year student in a Precision Medicine MSc.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.