Home Blogs & News

Machine Learning in a trusted research environment

The use of artificial intelligence systems in the field of medicine is rapidly expanding, already beginning to displace some aspects of clinical diagnostics and clinical trials. In a healthcare industry increasingly focused on precision medicine, the processes for the development, testing and deployment of machine learning applications is receiving significantly more attention.

Machine Learning (ML), is a subset of Artificial Intelligence, the science of having computers make more independent decisions, or do so in a more “human” way. ML research focuses on training a computer to “learn” and recognise patterns in data and then find those patterns in new data to classify or categorise the data. For example, we have already seen that we can feed brain scans to a machine and train it to recognise the early signs of Alzheimer’s dementia. In the most basic form, a computer must be trained to see patterns (e.g. it is fed brain scans of study participants who are known to have a disease) before it is then fed new data or images where it can then look for these patterns (e.g. new scans of patients showing early signs of AD). Machines have an advantage over humans of being able to quickly and consistently analyse complex data and give feedback to human consultants: they are then armed with more information to diagnose or treat individuals. Speed and reliability are good features of ML but not sufficient for medical use cases. That is why there needs to be a wider context of development and validation in a clinical research setting prior to application in the clinic – where regulation needs to be considered.

Within the Digital Research Environment (DRE), we can really see the process in action: Users can collect their data (or find it in our FAIR Data Services) and organise it using basic SQL or R scripting and then visualise it using our data table analytics module. We see more advanced analysis of data within Aridhia’s Workspaces through the use of R packages, which are used to create R scripts and Shiny apps. These packages are easily accessed directly from a CRAN mirror. Having organised their data, users can begin to build and deploy ML models. ML models can be built in VMs and trained to recognise patterns in the Workspace data; we tend to find that machines with GPUs provisioned give great performance here. Further Workspace data can then be used to test and run models.

Workspaces allow the deployment of these models in Microsoft Azure’s ML services, we have already seen the potential of using data in our DRE to analyse near-real-time patient activity and progress. As touched upon in this UCL Hackathon and our previous FHIR blog post, Aridhia has enjoyed working closely with the team at Great Ormond Street Hospital (GOSH) and Project Fizzyo who used Azure’s ML services to receive real-life patient data, sort and filter the data for interesting markers and then identify patterns in the data which could be further analysed by researchers.

Our enablement team has also been working on integrating further ML products into our platform. We created a Workspace which had Cromwell on Azure enabled so that we could test the theory of putting the data stored in the DRE through a complex bioinformatics workflow. Cromwell was developed by the Broad Institute and is an open source tool for orchestrating the tasks needed for genomic analysis. We were happy to prove that integration with our Workspace was very easy to achieve and that this could be used with data stored in the Aridhia DRE. Tools like this will help to enhance the project lifecycle within the Workspace: capture your raw data and upload it automatically, use the built in tools to sanity check and organise it for processing before sending it to cloud computing services for transformation. After it’s ready, you can then use the Workspace to send your data to further machine learning services for analysis and feature finding. All of this from one central space which is accessible from anywhere.

Aridhia’s approach to these challenges is to develop ways in which we can bundle advanced data science capabilities into our Workspaces, so that users with specialised skill sets (whether that be in rapid prototyping, disease modelling, etc.) can take advantage of these without ever leaving their secured working environment. Using these services should be an indispensable step in the research lifecycle and we aim to make this as convenient as possible for our users. As demand for Machine Learning tools and services increases within the digital health sector, the DRE will grow and expand to fill it.

Machine Learning in a trusted research environment

Laura Shishodia

Recent Posts