March 15, 2023 | Robert
For organizations that take the decision to invest in a research environment, they also need to consider whether they want to build it themselves or to procure it from a commercial partner like Aridhia. Build versus buy for a software platform has been a common dilemma over many years, whether in healthcare or any other major field. However, with the ease and flexibility of SaaS, it could be argued that the question to ask is whether there is a compelling reason not to buy an off the shelf product?
But does that hold true for research environments – do they have intrinsic qualities which justify the build your own approach? In this blog I’d like to discuss the various factors involved and debate the pros and cons of build (which I will also refer to as DIY) or buy.
What constitutes a trusted research environment?
We should first define what such an environment is, noting that they are called by several names, including Trusted Research Environments, Safe Havens, and Secure Data Environments. Trusted Research Environment (TRE) seems to be the most widely used term just now, so I will use that for convenience in this blog. The scope of a TRE is often underestimated; for example, whilst a locked down VM attached to some storage capacity may be suitable for single project analysis, it does not constitute a TRE as is now commonly understood. While there may be no single, canonical definition for such an environment, there are several characteristics that we consider as being necessary for a TRE, including:
“A research environment needs to consider the organisation’s governance model and the end-to-end lifecycle of a project, which includes the step of finding and accessing data.”
- It provides approved users with access to a secure computing environment containing sensitive data, typically healthcare related.
- That the TRE has been architected and built to support use cases which can span clinical and academic specialties, often in collaboration with colleagues across multiple organisations.
- Data stay in the TRE. Data only leaves the TRE through an audited approval process (sometimes known as an airlock).
- It has a repository of tools and services available for use in research projects. Tools will usually be a mix of both licensed and open-source software.
- Users participate in projects, often organized as workspaces.
- Resources such as data, code, etc. are not shared between workspaces, unless explicitly agreed to by relevant parties.
- The environment operates to an Information Security standard, typically ISO27001, and has been built to a model such as the “Five Safes framework”.
I would add that a TRE should follow the FAIR data principles, and to do this it needs functionality such as a data catalogue for searching for datasets and a mechanism to get approval for accessing them. A research environment needs to consider the organisation’s governance model and the end-to-end lifecycle of a project, which includes the step of finding and accessing data.
How do you build a TRE?
If we consider the above characteristics as being typical of a TRE, let’s now consider how such an environment is built. Early versions of TREs were typically hosted on-premises and built and managed by an internal IT department. The internal IT team would create a virtualized compute environment which gave users access to various software tools and directories. The environment was often heavily locked down, sometimes to a physical access location, supporting a limited set of use-cases, with an approval process determining which resources could be accessed by a particular researcher or study.
While these implementations met the needs of basic research scenarios, many projects required more flexibility, scale, and functionality. This, combined with developments in health data science, information security, and in cloud services, meant that organizations started looking at alternatives to what could be achieved in-house. At Aridhia we recognised this and led the way in developing a cloud-based SaaS/PaaS offering, the Aridhia Digital Research Environment.
Whether being built in-house or as a commercial offering, there are many factors involved in building a TRE and I would like to call out five areas for particular attention, some obvious, others less so.
Admit the scope of what you are building
“…it is a significant software development activity which will require ongoing enhancement and maintenance.”
First, you need to recognize that this is a large undertaking and that you will be in it for the long haul. Moving on from those early versions, TREs now require a broad range of features and an ongoing roadmap of planned improvements. Its users are essentially conducting R&D in their chosen field, ranging from relatively straightforward statistical analysis to running complex pipelines and federated analysis with complex governance rules. The bar in terms of features, security, integrations, and tooling gets higher and higher every year.
Building a TRE therefore needs an extensive range of skills and experience; it is not a part-time effort which can be pulled together by a small group of infrastructure engineers, as may have been the case 10 years ago. Nor is it a ‘project’ with a clearly defined beginning and end; it is a significant software development activity which will require ongoing enhancement and maintenance.
Understand the teams, tools, and practices you will need
Any significant software development activity requires:
- Various roles including a product owner, scrum master, architect, software developers, quality control engineers and infrastructure engineers.
- A rich set of skills in your preferred technical stack, in UX design, in database and back-end development, microservices, cloud engineering, security engineering, containerization etc.
- A suite of tools for source code control, an IDE, software project management, automated testing, vulnerability scanning, etc.
- Compute environments for development, integration testing, demos, penetration testing, etc.
People will then need organized into teams, split by whatever preference you think works best e.g., front end and back teams or based along feature lines. Techniques such as continuous integration, automation, shift left testing, and frequent deployment cycles will need to be embraced.
Don’t forget all the other wraparound services which users will expect, such as a help desk, a knowledge base, training material, out-of-hours support, and so on.
Information security, privacy and governance are vital aspects of a TRE. Most environments will store anonymized and pseudonymized data, however the TRE must treat this as sensitive data. The environment needs to be built from the ground up with security in mind, and certifications are a way of demonstrating this to the information governance team, the data providers, and the general user community you intend to work with. At a minimum, an investment in ISO27001 should be made, with a further extension to ISO27701 for GDPR compliance. Depending on geography, other certifications such as HITRUST for the US, or Cyber Essentials Plus for the UK, may be needed.
Achieving these certifications needs a significant investment, requiring dedicated FTEs and six figure sums of money. For example, even with a mature information security management system and an ISO27001 certification in place, Aridhia spent a further 18 months of dedicated effort in achieving HITRUST to certify HIPAA compliance. We employ two full-time employees dedicated to running our information security program. In addition, of course, is all the time spent on this topic by our product, development, QC, operations, and enablement teams, not to mention our senior management team and DPO.
For institutions that already have ISO27001 in place for other environments, they may consider that extending the scope to build and run an in-house TRE should be relatively straightforward. Whilst this may be true for some controls such as HR security (ISO27001 – Annex A.7) what about for System Acquisition, Development & Maintenance (ISO 27001 – Annex A.14)? This control deals with the security requirements of the information system, looking at practices such as tracing security requirements through the development process, secure coding, and security testing. An in-house TRE must be built with a rigorous Software Development Lifecycle which will stand up to scrutiny from an external auditor.
Getting beyond the Minimum Viable Product
The MVP of a software development activity is a version with the minimum number of features required to make it useful for early adopters. Getting to an MVP requires a focused commitment of people, time, and resources. There are inevitable delays and missteps before reaching the point where there is something usable. There will be a surge of satisfaction from the development team when that MVP is delivered, and rightly so, before the inevitable feedback of “this is great, but on reflection what we really need is…”.
This is where reality dawns for many software development projects and the true Total Cost of Ownership (TCO) becomes apparent. The team now needs to adapt to staying up to speed with emerging requirements, developing new features, maintaining existing ones and dealing with the mounting technical debt which will have built up in the rush to get the MVP ready for use.
When building the TCO model for a software development, the entire lifetime must be considered, factoring in the years of ongoing activity required to maintain and enhance it, not just the MVP build effort.
What’s the opportunity cost?
I believe a key question to ask if an organization want to build their own TRE instead of buying, is what is the opportunity cost for doing this? By that, I mean that the organization needs to be thinking about the bigger picture regarding their research goals, i.e.:
- What projects and studies does it want to support?
- What datasets does it want to share?
- Does it plan to do data curation activities on the various datasets?
- Will it put a data steward team in place?
- Are there upstream (or downstream) systems that the TRE needs to integrate with?
- How will the organization promote the use of the TRE?
- What are the legal and governance processes that need to be put in place to enable all of this?
- Which external partners does it want to collaborate with in healthcare, pharma and academic?
These activities will require a cross-functional team within the organization with skills in research, information governance, data science, systems integration, data curation, and legal. This is where the opportunity cost comes in. If you are doing all the above and trying to assemble a large software development team to design and build the TRE, that is going to absorb a huge amount of time and energy. The software development element can easily relegate the above activities, to the detriment of the overall goal which is to enable research.
Buying a TRE
Commercial TREs have been available for many years now, in Aridhia’s case 10. Buying a TRE means finding and collaborating with a vendor who can act as a trusted partner for the long term. Many large-scale initiatives run tender processes to evaluate the various vendor offerings and ensure the best fit with their requirements. Buyers will typically consider the following characteristics:
“SaaS platforms can be typically be deployed and set-up in days/weeks as opposed to waiting a long time (years?) for a DIY solution to appear.”
- Functional requirements such as tools needed by the end user
- Non-functional requirements such as scalability and performance
- Security certifications the vendor must have in place
- Service Level Agreements on availability, help desk response times, etc.
- A fully managed service – monitoring, alerting, patching, upgrades, and more will ideally be done by the vendor
- Pricing over the contract period
- Timescales to build the environment and on-board data and users.
Often the buyer will speak to other customers of the vendor to build up a picture of its ability to deliver, the total cost of ownership, and the outcomes they have achieved. All of this helps the decision-making process and helps the organisation manage risk.
Buyers will of course want to avoid vendor lock-in when choosing a supplier to work with. Ensuring that open-source tools and data formats are used within the platform should mean that assets can easily be moved if needed. Customers should own all assets deposited to and created within the TRE – vendors should be data processors only with no rights to the data or any IP over code, models, etc. developed by users within the TRE.
Once the appropriate platform has been chosen, the deployment and set-up process can begin. Buyers should find a major advantage here in that the time to value is much quicker. SaaS platforms can typically be deployed and set-up in days/weeks as opposed to waiting a long time (years?) for a DIY solution to appear.
Building your own TRE
The perceived advantages for an institution in building their own software are primarily around control and ownership. Knowing the user base well and their needs can be an argument to building a tailored solution. In theory, this familiarity could deliver exactly what users need, as opposed to a more generic solution which has been built with a broader user/customer base in mind.
Ownership of the intellectual property and the underlying platform code may be factor too. For example, the organization may wish to develop a service which it can then offer to similar institutions for a fee, thereby recouping some of the development costs. I have seen this approach and understand why it can seem persuasive to whoever is funding the TRE; however, I would repeat the points above regarding the true TCO and the opportunity costs of the software build element. An organization can be successful in providing research services for a fee, without having to build all the underlying component parts itself.
Data sovereignty and security concerns around hosting in the cloud were often reasons to pursue an on-premises and in-house development. However, we have gradually seen those barriers to cloud adoption being removed, in fact most public procurement exercises for TREs now stipulate that it must be cloud-based, which leads nicely into the next topic.
Cloud adoption does raise an interesting question along the lines of, “can’t I just switch on a bunch of cloud services and knit them together to create a TRE?”. Cloud vendors would certainly like you to think so. Yes, the cloud does make things easier, providing you have the skills (and time) to choose, adapt, and integrate all the necessary components, and accept that you will need to build some yourself.
Similarly, if I wanted to celebrate a special occasion with my wife, I could take her out for a meal at a top restaurant, or I could download some recipes and have a go myself. She might appreciate the effort of me having a go, but whether it will be a 5-star meal and an occasion she will want to repeat, I’m not so sure.
“…your organization will still need a team of software professionals…”
Open-source cloud TREs have also appeared, which can give your development team a head start in that various services will already have been integrated to provide a baseline service.
If you are considering going down this road, I would recommend looking into:
- What features are currently available and is there a roadmap for new features?
- How well maintained is it and how often is there a new release available? Software can go stale quickly if there are no bugs being fixed, libraries being upgraded, and vulnerabilities being resolved. A recent examination of one open-source TRE showed a flurry of activity for 3 months, and then no activity at all for the following 6 months. If I compare this with Aridhia, we have several teams dedicated to working on new features and general platform maintenance, with a new release every week.
- Remember that ongoing maintenance, certification et al is your responsibility. Open-source TREs are a template to build on, not a solution in themselves. Whether assembling yourself from cloud services or extending on an open-source cloud TRE, your organization will still need a team of software professionals with all the skills, tools, processes, certifications etc. noted earlier in the blog.
Most importantly I would ask the cloud vendor, whether there are any success stories for this approach. Ask to speak to an existing customer (both the IT team and researchers) who has gone down this road and considers this to have been a good use of time and money, with successful organisational outcomes that end-users will attest to.
And the answer is…
Clearly Aridhia has a stake in this argument given that we have developed a commercial TRE offering. We often come across prospective customers however, who have tried the build your own approach, only to then come and talk to us a few years later. They found that the effort, cost, and time in DIY was just not worth it. In comparison, a commercial TRE should be able to be deployed quickly and purchased in a SaaS subscription model, with clear pricing tiers which allows the customer to pay as their needs scale.
So, going back to the original question, is there a compelling reason for an organisation not to buy a commercial TRE? I would argue that only a small number of large well-funded organisations can justify the DIY approach. These organisations will already have a significant software development capability in place and extensive experience in this field which they want to retain.
For the majority however, exploring the various commercial offerings and finding a suitable partner, will save time and money and allow them to focus on the research itself, which is after all the raison d’etre of why you want a TRE in the first place.