Over the past ten years, there has been explosive growth in the sheer quantity of human genomic data. The first human genome was sequenced in 2003, following more than a decade of international scientific cooperation and a total cost of over $3 billion. By 2010 over a thousand genomes had been sequenced (thanks in no small part to The 1,000 Genomes Project, which started in 2007). Just ten years later, in 2021 the number has reportedly surpassed 30 million.
What’s exciting is the huge potential of this data – especially if a large data set of many genomes can be analysed together. This opportunity has led to the development of genome-optimised analysis platforms. But these platforms need to overcome several key challenges in the sheer quantity of genomic data to start realising this potential.
Zetta Genomics, which spun out of theUniversity of Cambridge, explores four key challenges that face organisations when building genome optimised analysis platforms – and their solutions.

Delivering performance at scale
The central requirement of a genome-optimised platform is, of course, that it needs to be able to perform complex genomic analysis on the fly. It cannot be limited by the number of genomes included in analysis, as this will restrict the usability and potential application of the platform. We set ourselves a challenge – for our platform to have the capability to query and analyse 100,000 whole genomes in real time. Setting this ambition was essential in pushing the boundaries of existing indexing approaches and developing our own. It was important to us that our platform was genomic-oriented, rather than being forced into an approach designed for other types of data, which invariably would have lessened the level of insight a platform could eventually deliver.
In one data set alone, we have now indexed over 200 terabytes of data, including over a billion unique variants. The result is a platform that can now run complex queries integrating both genomic and clinical information in just a few seconds. The data can then be dynamically enriched and analysed in real-time to provide ‘more meaningful’ insight.
Ensuring security while enabling collaboration
All of this data and functionality is not enough unless it is supported by a robust data management system. Users must be able to organise their genomic data and combine it with clinical information to perform analyses and interpretations. What would add additional power was enabling users to collaborate effectively within the platform, all pulling relevant findings from centralised data which is consistent and easy to update.
Making a platform cloud-friendly is key to enabling collaboration between multiple organisations such as academic institutions, clinical settings, and research organisations. Focusing on a cloud-native solution is beneficial because it delivers faster deployment, with much of the infrastructure needed to support genomic big data applications – like monitoring systems, encryption, and security – available straight ‘out of the box’. Software updates, as well as information updates, are sent regularly and ensure that the solution keeps pace with user needs and security requirements. A cloud approach is also great for scalability, since more storage can be bought incrementally and does not incur the operational costs of buying, expanding, and maintaining a data centre.
On the other hand, we had to consider that not all users need exactly the same access, data, or functionalities. Designing a fully customisable platform was therefore key. In the case of our XetaBase platform, we used a REST API which allows not only interaction with the genomic and clinical data, but also gives users the freedom to manage and customise the platform.
Clinical interpretation and analysis
In clinical settings, professionals have a limited amount of time to spend on each individual case, so working with a truly informative and automated decision support system can save lots of time. The suite of tools in XetaBase allow clinicians to manage individual cases, check key metrics and interpret individual data without losing the cohort perspective.These are all key operations that clinicians need to perform in order to impact patient outcomes. XetaBase now features designed automated interpretation methods and the ability for users to design their own custom analysis workflows. These alleviate the manual burden of performing many common interpretations multiple times, enabling clinical users to vastly reduce the amount of time on each case. In turn, they are able to address a far greater number of cases than before.
Designing for the future of genomic medicine
The automated interpretation also makes it possible to run constant reanalysis, meaning that the insights currently hidden in sequenced genomes could be discovered over time and provide answers for cases with no diagnosis, without the need for manual intervention. This hints to the ultimate and exciting potential of XetaBase. Already we have seen the real-life impact of spotting genomic insights earlier – improving clinical care from diagnosis to prognosis and treatment.
Applied at the population level, this kind of constant genomic care could be revolutionary for our health services: identifying disease trends faster; intervening in individual cases earlier and with better prognoses; targeting treatments to achieve better patient outcomes; and helping to design future-proofed services.
We believe that XetaBase will be a key enabler of making genomic medicine approaches available to more and more patients – by alleviating some of the current technological challenges of genomic data management and interpretation.
About the author and more information
In January, our Lead Bioinformatics Consultant at Zetta Genomics, Marta Bleda Latorre, was at Festival of Genomics 2023. In addition to meeting up with some of our colleagues in the genomics community, she also spoke as a guest in Microsoft’s presentation on powering genomic data analysis and the development of biomedical platforms.
To speak to a member of our team about Zetta’s participation in the Festival of Genomics or the development of the XetaBase platform, please email info@zettagenomics.com.