In genomics, size really does matter: the larger the dataset, the greater the returns. In the last two decades we have moved from a single genome to datasets of over 100,000. In the next decade datasets of millions will be commonplace. It’s clear, however, that having a big dataset alone is not enough. It is our ability to query, access and interpret these treasure troves to uncover the actionable insight within that will usher in a new era of precision medicine for millions across the world.
One genome to 100,000: scale
To understand where we are in the genomics revolution, we must know where we’ve come from. In 2003, we had mapped the human genome for the first time. It took over a decade, some of the most brilliant minds on the planet and billions of dollars. In the years that followed, the number of genomes grew, giving us a tantalising glimpse of precision medicine’s potential.
By the end of that decade genome sequencing was moving to the clinic, demonstrated in 2008 by the case of Nicholas Volker, the first child whose treatment was based on a diagnosis determined by genome sequencing. Each genome required highly specialised and extremely rare know-how, and often one-off technologies. It was time consuming, costly and delivered limited results because there was little reference information available to drive interpretation. Genomic medicine seemed too complex and expensive to scale.
In 2013, the UK government changed the game – setting up Genomics England to sequence an unprecedented (and many thought impossible) 100,000 genomes within five years. Importantly, the project wasn’t simply about research findings from large numbers of sequenced genomes (although that played a part), but the clinical utility of the data they yielded. This foundational aim was a world-first: a population level genomic medicine service. This wasn’t just a step forward – but rather a gigantic leap.
At the time the 100,000th genome was sequenced in 2018, I was working at Genomics England with my Zetta Genomics co-founder, Ignacio Medina. The goal was to turn the hitherto specialised art of genomic analysis into an industrial process using modern big data technologies. In terms of data management, this essentially meant starting from scratch.
100,000 genomes to 100 million: access, interpret and action
Why start from scratch? Most obviously, the data volumes in play simply overwhelm existing data technologies, but there’s a second more subtle and intriguing reason; precision medicine demands changes to the in vitro diagnostic (IVD) paradigm – moving from a ‘linear assay-to-result workflow’ to an ‘iterative data-driven investigation’.
To explain, while we only need to sequence a person’s genome once, our ability to interpret and gain value from it progresses all the time. The key is to keep going back to the genome as other factors change – something existing ‘flat file’ data management systems fail to deliver. These ‘other factors’ can be categorised in two ways:
1) changes in a person’s medical condition;
2) changes to our understanding of the genome.
As an example of the first factor, in neonatal genomics the condition of an acutely unwell infant can change rapidly and so we need to equally rapidly reinterpret their genome. While the data might yield no results in week one, changes to the condition by week two might reveal a genomic match that leads to a diagnosis with the potential for earlier and more effective interventions. This dynamic, genome-driven approach stands in stark contrast to diagnostic odysseys for rare disease that can take years to resolve.
Turning to the second factor, a patient might present with a condition today and receive no genomic result because we do not yet know the connection. Yet, in a month, a year or a decade, fresh genomic discoveries might provide that diagnosis directly from their existing data without the need for any further clinical procedures.
This is, essentially, what Zetta Genomics’ XetaBase can provide: a genome-optimised, highly automated data management platform that continually interprets the genome based on the latest information. When there is a ‘hit’ following the release of a new gene panel, for example, users are automatically alerted to the potential for a genomic diagnosis. Further, and critically, it makes this information securely but easily accessible to researchers and clinicians where they can action it – whether that’s at the laboratory bench or the patient’s bedside.
100 million and beyond
I am hugely proud of Zetta Genomics and XetaBase, but not purely for personal or even business reasons. While a co-founder alongside Ignacio – who is the architect of the OpenCB platform on which XetaBase is built – Zetta does not belong to us. It is the result of a truly astonishing collaboration that was born out of pioneering work from both Genomics England and the University of Cambridge – and carried forward by the open-source community.
Even now, as we scale up to meet the needs of a rapidly expanding global genomics sector, we move forward with the support of far-sighted investors, commercial partners such as Microsoft, Fujitsu, and Future Perfect Healthcare, and organisations such as the UK’s NHS and its Genomic Medicine Service. Together, we have built a solution that one customer describes as, “completely unlike anything else out there”. They will use it to transform healthcare services – particularly among economically disadvantaged and underserved communities.
Designed to scale to need, XetaBase will continue to unleash the potential of precision medicine for people across the globe – able to harness the power of millions of genomes, or even more.