Blog

OpenCB: an open-source big data platform for genomic data

Earlier this month, our Software Quality Manager, Pablo Marin-Garcia, was in Vienna for the European Society of Human Genetics’ 2022 conference. In addition to attending some enlightening academic sessions, he also presented a poster on some of the latest developments of OpenCB.  

What is the OpenCB? 

Simply put, OpenCB is an open-source software for analysing genomic data, started in 2012 and maintained by a community of developers. Thanks to big data technologies, it has the power to handle the demands of genomic data at scale.  

As high-throughput screening technologies have advanced, genomic data has been produced in increasing volumes. Current large-scale clinical genomics studies can use as many as thousands of whole genome sequences, which equate to terabytes of data. Bioinformaticians are now facing the challenge of how to store and analyse this volume of data, which legacy technologies can no longer support.  

This requires the creation of software and data management platforms which can handle the big data age of genomics, while delivering on performance, scalability, flexibility, robustness, accuracy and security. Hence, the creation of OpenCB.  

How can OpenCB be applied?  

Because it is an open-source project, the potential of OpenCB really is unlimited. Currently, there are three main projects developing specific functionalities:  

  1. CellBase, a MongoDB database used for querying genomic annotations 
  2. OpenCGA, a variant and clinical data store based on MongoDB and HBASE with Solr indexing 
  3. IVA, a web-based analysis client of  OpenCGA 

Deployment is based on Docker and Kubernetes under CI/CD. 

Major use cases of OpenCB’s features include population-scale and clinical interpretation analysis for rare diseases and cancer. Because of the power of the software to handle big data, queries and analysis can be done in real time. This work can be conducted under a strict data access authorization model, to maintain privacy compliance.  

You can read more about OpenCB in Zetta’s poster for ESHG 2022 below:  

To speak to a member of our team about the poster or the power of OpenCB, please email info@zettagenomics.com

Share: