Datasets Library Service

Facilitate a collaborative knowledge commons: give your stakeholders access to a ‘single source of truth’ that makes it easier to collaborate and adhere to standards across the project lifecycle

Our Datasets Library Service is used to maintain a single source of truth for clinical dataset definitions across an entire project lifecycle, making it easier for your project team to share information and collaborate, no matter where they are based.

Providing field-level information on the specification of data elements based on project-specific datasets or national/international standards, the Datasets Library is a secure website which supports improved adherence to standards across multiple teams and organisations and facilitates a collaborative knowledge commons for data-driven projects.

Our data services help get your project underway quickly and easily – simply choose what you need, add it on to your existing AnalytiXagility subscription, and you’re ready to go

These standards can be national or international (such as for notifiable diseases), or be project specific. The library comes pre-loaded with exemplar datasets in risk of readmission (PARR30), cancer outcomes (COSD), diabetes, renal cancer and other domains, and gives you the ability to host your own project-specific datasets.

The website is fully searchable, allowing users to find and focus on the specific aspects they need for their use case, facilitating the ‘knowledge commons’, where good practice and previous work can be re-used. An API is also provided for programmatic use.

Aridhia provides an Excel template that can be used to record the dataset definitions, look up lists and external references. The template can also capture information governance requirements, such as flagging personal health information (PHI) fields and generating configuration files for Aridhia’s Data De-identification Service. The Library offers full version tracking of datasets, accelerating requirement capture by business analysts while ensuring that all project stakeholders has access to the most up-to-date definitions.

For the technical user, a set of utilities is provided that turn the Excel template into useful snippets of SQL for table definitions, views and queries. This saves time to prototype or implement a data integration or storage strategy.

Where XML is used, a schema file can be generated to validate content and structure, accelerating the use of standards, and preventing or reducing errors in data transmission.

The roadmap for this service is to support a range of formats, including JSON and Apache Avro, as well as providing synthetically generated data.


  • Web-based secure service
  • Field-level information on national and international standards
  • Pre-populated with exemplar datasets
  • Template-driven with version tracking
  • Turn datasets definitions into snippets of SQL, XML schema files etc.
  • RESTful API provided for programmatic use
  • Pan-government accredited hosting at Business Impact Level 2


  • Provides a single source of truth for dataset definitions
  • Facilitates the ‘knowledge commons’ for good practice and re-use
  • Accommodates project-specific datasets
  • Supports capture of information governance requirements
  • Ensures standards are adhered to across multiple teams and organisations
  • Prevents/reduces errors in transmitting data from one organisation to another
Get in touch to find out how we can get you started