Blogs & News

Home Blogs & News

R for Researchers: practical data analysis in AnalytiXagility


Welcome to the first blog in a series looking at using the R programming language within AnalytiXagility as a researcher, why it’s a good idea, and how it can enhance the productivity and reproducibility of your research.

Modern health research has changed dramatically in recent years. Conducting research now involves dealing with increasing amounts of complex data pushed by new technologies, especially in the “-omics”; genomics, proteomics, lipidomics etc. Beyond those areas, the amount of data generated by an experiment is on an ever-upward trend. These increasingly data-intensive fields require researchers to have data-driven approaches, backed by the right tools and approaches. The end goal is to speed up the translational research process. Ultimately this will reduce the time to clinical insight to a point where your research can improve the quality of care for a patient. A worthwhile goal, I’m sure you would agree.

R is a key tool for analysing and visualising data. A programming language whose original use was statistical and graphical data analysis, R has matured and the user-base has grown, becoming an incredibly powerful one-stop-shop for any task a data scientist or researcher would require.

Before jumping into the various ways R can improve your research process, I’ll briefly introduce myself. Hopefully this will help show the angle I’m approaching this from, and why I think R is so useful for researchers and data scientists.

Trial and error – my research process

Originally trained as a biochemist at undergrad with a developing interest in biomedical sciences, in 2013 I was accepted to run an NIHR-funded PhD project looking at vascular complications in childhood Inflammatory Bowel Disease. This was far more translational than anything I had done before, and introduced me to more and more complex statistics to account for the real-world patient data I was collecting. I accounted for this, as a lot of researchers do, with numerous spreadsheets decipherable only by myself, accompanied by a huge SPSS database for my multi-parametric analyses.

Now working at Aridhia, I feel qualified to proclaim this an inefficient system, but it’s one that thousands of researchers are using all over the world on any given day.

The data I had created was fathomable to only me; no-one else could easily work with or decipher it. Complex analyses had to be carefully recreated from scratch for any new data. This required a detailed note-taking system to ensure I was doing the same thing to each batch of data, and transformations performed in Excel weren’t easily reproducible as there was no definitive record of what I’d done. Data collected during clinics was handwritten and manually transcribed into my SPSS database, leaving room for errors. In short, I’m lucky that my study had small participant numbers or I would have had little chance at finishing in time!

Using R in clinical research

Today, as a member of Aridhia’s Enablement team, I now help people like my former self to jump into the world of data analysis in our AnalytiXagility platform, which comes with built-in R capabilities.

Working with a well-written collection of R scripts has obvious benefits over the arcane folder trees of badly labelled Excel sheets that most researchers end up with. Reproducibility becomes a background by-product of your research, rather than an additional burden. Saving time in analysis, easily reproducing analyses, and producing clear and comprehensive documentation of your research lifecycle enables both easier sharing of insights, and improves your confidence in them when submitted to peer review. All in all, R provides benefits across the board.

In addition, R’s functionality as a statistical tool and its active user-base means that there is a specific package, or group of tailor-made functions, written for pretty much any task you might want to do. These packages are maintained and distributed by the Comprehensive R Archive Network repository, or CRAN as it’s known. I’ll signpost more resources available to R users in an upcoming blog.

Today I can say that if I had taken the time to learn R at the beginning of my PhD, I would have saved myself a whole heap of time and stress, as well as finishing my project with an easily-navigable, auditable and reproducible dataset to pass on to future researchers. Over the next few weeks I’ll demonstrate how you can use R within AnalytiXagility and how it might help you avoid the same pitfalls that I encountered!