Blogs & News
Our previous blog on Aridhia’s work as part of the PHEMS consortium introduced the Federated Node and gave a high-level view of its integration with the Aridhia DRE. This blog provides further information on the Federated Node, including its release, licensing, and upcoming development priorities.
Data protection laws like GDPR make data sharing across national borders and institutional boundaries increasingly difficult, restricting the ability of researchers to share data with colleagues outside of their organisation.
Federated Data Sharing allows researchers to analyse data that they cannot access directly. The video below details a three-tier model for understanding data federation:
The Federated Node (FN) is a component used for running federated tasks, and is based on three existing open-source projects:
• The Common API
• Keycloak
• Nginx
The Common API provides the structure of the API calls, Keycloak is used for token and user management, and Nginx is used as a reverse proxy. The FN needs to be deployed to a Kubernetes cluster, and requires a Postgres database for storing user credentials.
A deployed Federated Node needs to be associated with a Docker container registry. All analytical tasks that can run against the federated data are hosted as images in this registry, and approved users can initiate these via the FN.
Full details of the FN tech stack and comprehensive deployment instructions will be available from the project Github repo when the FN is made open-source.
The FN will be available as open-source project from October 2024 under the GNU GPL v3 license.
Releasing under this license means that the Federated Node will be free to use, and that other projects can modify and distribute the source code as they need, while ensuring that any subsequent projects based on the FN must also be open-source.
Initial development of the FN is being carried out for the PHEMS project. As noted in the previous blog, PHEMS has two high-level use cases:
Use Case 1 | Use Case 2 |
---|---|
Hospitals in the PHEMS network to share benchmarking data for clinical outcomes | Hospitals in the PHEMS network pool their data to train machine learning (ML) models |
The Federated Node already supports use case one, and developing use case two is one of our immediate priorities:
• Priority 1 – responding to initial user feedback to ensure the FN is easy to deploy and all supporting documentation is as clear as possible.
• Priority 2 – evaluation of open-source ML frameworks, and trial integrations with the Federated Node to meet PHEMS use case two. We aim to have a PoC for this use case deployed in Q1 2025.
In addition to the above we already have a number of other future development possibilities under consideration:
• Developing a beacon endpoint that will give users limited insight into the federated data before requesting access
• Introducing L1 federation capability
• Improved management functionality, including an edit dataset endpoint
• Integration with other federated analysis frameworks, e.g. DataSHIELD
If you would like to know more about the Federated Node, Ross will be discussing it as part of a public PHEMS meetup on October 22: attendees can sign up here.
September 24, 2024
Ross joined the Aridhia Product Team in January 2022. He is the Product Owner for FAIR Data Services, and Aridhia's open source federation project. He works with our customers to understand their needs, and with our Development Team to introduce new features and improve our products. Outside of work, he likes to go hill walking and is slowly working his way through Scotland's Munros.