Alison Meynert is Senior Research Fellow and Bioinformatics Analysis Core Manager at the MRC Human Genetics Unit – part of the Institute of Genetics and Cancer – at the University of Edinburgh. Alison talks with Zetta Genomics’ Stuart Jenks about the rollout of XetaBase to the EIDF Variant Repository Pilot Project – and its potential to help transform genomic research and, ultimately, clinical outcomes.
Could you tell us a little about the pilot project you are working on?
The Edinburgh International Data Facility (EIDF) Variant Repository Pilot Project will initially encompass 1,000 to 2,000 whole genome sequences, but this could rise significantly – to around 50,000 WGS or more – if we identify new collaborative opportunities .
While this is a project with no direct clinical objectives, data in this pilot supports research identifying genetic variants in rare diseases such as eye malformations and developmental disorders like Cornelia de Lange Syndrome, and population genetics of isolated islands in Scotland and Croatia.
A key objective of the MRC Human Genetics Unit is to not only discover which genes cause disease but also – and importantly – how the failure of these genes cause disease.
What genomic data management issues have you faced ?
There are a number of challenges that include: the volume of genomic data we want to store; the ability to de-silo and access this data; synchronising information across projects; interoperability; and usability.
Existing systems aren’t built with the needs of genomic data in mind. Mostly file based, genomic data is stored as an artefact – where it’s doing very little good. It’s difficult to access, annotate, re-analyse and share – so the potential for additional insight and discovery is wasted.
We want our project to change this paradigm – to liberate the data we and other organisations hold to create dynamic and trusted research environments. Essentially, we need a genomic-native data management solution; one that is designed to meet our needs specifically and allows us to store, easily access, analyse and constantly re-interrogate data at scale and speed.
What brought you to XetaBase?
We need a data management system that’s built on proven technology, easily interrogated by bioinformaticians and non-bioinformaticians alike, copes with large volumes of data at speed – and makes this data securely accessible. Last, and by no means least, our data store needs to be ethically and regulatory compliant.
These demands have brought us to Zetta Genomics – a company we know through the open-source OpenCGA platform, and its architect and Zetta founder, Ignacio Medina. XetaBase is built on OpenCGA, so this was a natural next step.
How will XetaBase help you to address the challenges of genomic data management and achieve project objectives?
Developed during Genomics England’s 100,000 Genomes Project, we know that XetaBase is built on a proven technology platform, designed specifically for genomic data management at scale.
Adopting a single, open source, interoperable platform allows us to plug in a wide variety of information sources. We can de-silo and access huge volumes of currently under-utilised genomic data. Now, we are able to analyse variants within and between studies to cross compare cohorts. It also allows us to, for example, use indexing to rapidly access data and automatically synchronise variant annotation – removing time intensive duplication of data annotation across multiple projects.
We are able to access data from virtually anywhere, with customisable APIs for bioinformaticians proving particularly helpful. Beyond this, we’ve received very positive feedback on the usability of the IVA web browser – bringing both specialists and non-specialists the ability to easily ‘slice and dice’ the data.
Further, we can now open up data to a range of groups, with access granted according to their needs. Researchers, for example, can be given permissions based on their specific research fields, whilst clinicians can access data based solely on the individual or family they are treating. It will go a long way to creating compliant, trusted research environments that amplify the power of genomic data held around the world.
It means that data use is now smarter and more reliable, so we also reduce risks. With data now dynamic rather than an artefact, for example, we avoid issues such as data ‘timing out’ due to lack of use.
Data management is completely aligned to our project objectives, so we can better demonstrate project effectiveness – bringing greater confidence in project value to our sponsors.
Can you think of a day-to-day example of how Xetabase will help the MRC Human Genetics Unit?
We currently provide analysis for a number of MRC Human Genetics Research Groups. They email us, detailing their requirements, and we will go back to them to clarify, refine and fully identify their needs. Depending on complexity, this process can take a number of days – which can extend into weeks should they come back to us post-analysis with additional questions. Obviously, this takes up a lot of our and the research groups’ internal resources, builds in delays to the provision of meaningful analysis – and by extension – to the delivery of care.
Using XetaBase’s IVA interface, this potentially weeks-long process will be reduced to minutes. Research groups can now define their own needs and refine results using the self-service web browser. It reduces time, resource, cost and – critically – gets the results of analysis to clinicians more quickly than ever before.
Can you see benefits for the NHS?
As NHS Scotland works through a new digital strategy, it’s clear that NHS systems are creaking under the strain of data they are being asked to hold. A cloud-hosted, hugely scalable and truly dynamic data storage and management solution like XetaBase offers compelling advantages.
If we extrapolate potential time, cost and human resource savings – as well as improved patient outcomes – across the NHS, we can begin to see how next generation data management technologies will help to transform both research and clinical care.
What’s next for genomic data management solutions?
Beyond our project objectives, we aim to act as an exemplar for next generation data management technologies – to capture and share learning of its implementation and use. Further, we’re aware that we have only just scratched the surface of how these technologies can have a positive research and clinical impact – and we will be working to push the boundaries.
It really is a case of the more that we do, the more that we learn – and the more that we learn, the more that we can do. Watch this space.