Is Federated Analysis the Way Forward for Genomics?

August 26, 2015 | Rodrigo

Last month, four of the world’s leading researchers from within the genomics community published a paper which suggested that “…between 100 million and as many as 2 billion human genomes could be sequenced by 2025, representing four to five orders of magnitude growth in ten years…” to become the biggest of all the big data domains, reaching exabase-scale genomics within the next decade. [1]

A common concern within the genomics community is the availability of sufficient data in any one site to come to any valid scientific conclusion and move beyond that into the application of the science in healthcare delivery. As comparable sequencing and variant calling technologies become available that allow consistent analysis, new models of interaction are required which facilitate the collection of vast amounts of data from multiple sites into secure repositories to enable collaborative analysis on a global scale.


The Global Alliance for Genomics and Health GA4GH has been established to promote the required sharing of human genetic data across multiple sites. The Alliance brings together almost 400 expert organisations from across the world in a bid to address some of the sharing and analysis issues which the genomics community faces. The GA4GH has now put in motion the groundwork for a limited three month proof of concept (PoC), coordinated by Aridhia, that will trial the concept of federated genomic analysis in a bid to address the computational challenges facing the healthcare industry.

Is federated analysis the answer?

Federated analysis describes the ability to access data for distributed analysis without physically sharing it, and therefore provides an ideal foundation for a globally fragmented and distributed genomics community which stores data in isolated databases.

While some sharing models rely on pooling data, whether in a private, open or commercial framework, federated models aim to respect important local legal, privacy and consent arrangements by allowing relevant data to remain in local storage, reducing the need for data to travel. Researchers are then able to gain access to a larger ‘virtual’ dataset comprising information – if not data – from multiple sites, upon which analyses can be run simultaneously, whereby increasing research efficiency.

Federated analysis therefore promises researchers access to larger sample sizes, facilitating large-scale data comparison to get better insight and drive improvement.

Why not?

There is some concern that federated models are too rigid or expensive to implement and don’t fulfil the basic criteria of (a) encouraging data owners to participate and (b) being useful for analyst end users. In commercial settings, such as flight or hotel APIs in travel booking, extensive federated searches exist, so there may be lessons to learn from the success factors in those settings. The Genomics API is defined as:

The API is implemented as a webservice to create a data source which may be integrated into visualization software, web-based genomics portals or processed as part of genomic analysis pipelines. It overcomes the barriers of incompatible infrastructure between organizations and institutions to enable DNA data providers and consumers to better share genomic data and work together on a global scale, advancing genome research and clinical application.

With a reference implementation available, these efforts are at a sufficiently developed to allow a proof of concept that shows a number of sites implementing a first level of API for a limited period of time and documenting the experience. This should improve communication and understanding of how federated analyses might work at a pragmatic level as well as providing useful feedback to the APIs themselves.

The primary objectives of this PoC are to increase pragmatic understanding of federation strategies through a set of simulated test cases and to document the experience for the community benefit. As an additional goal, it would be useful to provide some feedback on how the Genomics API model could be extended to capture the requirements of clinical transactions too.

We are excited to be working with groups around the world, including the Stratified Medicine Scotland Innovation Centre, to run this PoC later in the summer. You can download the Federated Analysis paper to read more about the proposal and rationale, but please get in touch if you have any specific questions.


[1] Zachary D. Stephens, Skylar Y. Lee, Faraz Faghri, Roy H. Campbell et al. Big Data: Astronomical or Genomical? PLoS Biol. 2015 Jul 7;13(7):e1002195. doi: 10.1371/journal.pbio.1002195  http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195


 

rodrigo

Chief Technology Officer

Rodrigo Barnes is CTO at Aridhia, which he joined in 2007. He is an R&D software engineer and with a mathematical background and has designed and developed analytical and data management applications in a number of healthcare, life sciences and knowledge management start-ups.

He is skilled at developing new approaches to healthcare information problems. Rodrigo has been instrumental in designing Aridhia’s approach to big data within healthcare and to the development of our collaborative analytics service. He is now responsible for technical and product strategy and takes a lead in Aridhia’s approach to precision medicine. He is currently working on Aridhia’s platform for genomics, including executing evaluating annotation pipelines for clinical use.

Since 2007 Rodrigo has led a range of programmes, from core technical design, to on-the-ground delivery of collaborative projects involving multiple academic and healthcare organisations, including:

  • Online dynamic anonymisation of patient records
  • Patient pathway analysis for NHS Referral to Treatment analysis & reporting
  • Real-time infection control & bed management information
  • Clinical analytics dashboards
  • A platform for chronic disease management and professional education of clinical staff
  • The collaborative analytics service, AnalytiXagility
  • Tools to search, integrate and process data with controlled vocabularies and clinical ontologies

Leave a Reply

Your email address will not be published. Required fields are marked *