In late 2016 we were able to demonstrate the potential power of federated analysis of genomics data (see the linked blog for details and a video) how multiple sites from across the world can collaborate and enrich their information.
This paper details the approach taken by the collaboration involved in this test, including:
In July 2015 four of the world’s leading researchers from within the genomics community published a paper which suggested that “…between 100 million and as many as 2 billion human genomes could be sequenced by 2025, representing four to five orders of magnitude growth in ten years…” to become the biggest of all the big data domains, reaching exabase-scale genomics within the next decade.
A common concern within the genomics community, and one which is shared by the authors of the aforementioned paper, is the availability of sufficient data in any one site to come to any valid scientific conclusion and move beyond that into the application of the science in healthcare delivery. As comparable sequencing and variant calling technologies become available that allow consistent analysis, new models of interaction are required which facilitate the collection of vast amounts of data from multiple sites into secure repositories to enable collaborative analysis on a global scale. At its heart this is about clinical communities pooling their knowledge of relevant variants.
As the sequencing data being created independently by multiple projects across the world (such as the US Precision Medicine Initiative) which aim to map genetic variation grows at an exponential rate, our ability to adequately store, share and analyse this data becomes an increasingly urgent issue, one which requires early and detailed consideration of the infrastructure needed to support future growth in the domain.Tweet