18 Nov 2021

HPC and AI makes genomic research feasible at population level

Given Kao Data’s first of three data centre campuses is located in the heart of the UK Innovation Corridor - a hotbed for bioinformatics and life sciences, we are regularly involved in discussions revolving around the latest advancements in this field of work.

Thanks to advances in next-generation sequencing technologies (NGS), genomic research is thriving since its origins back in the 1990s. Evolution in AI and machine learning, combined with the advent of high-performance computing (HPC) have accelerated progress further, reducing analysis times from days to hours, minutes even, if cutting-edge supercomputers like NVIDIA Cambridge-1, hosted at our Harlow campus, are being used.

Associated costs have also come down to the point where individual genes can be sequenced routinely. Indeed, some R&D facilities are now able to routinely sequence millions of cases annually and an entire genome can be sequenced for a fraction of the cost that it took just 24 months ago. Genome sequencing is a computational discipline that continues to surprise and amaze such is the exponential rate of year-on-year progress.

The ability to carry out genomic research at population level brings with it a myriad of health and wellbeing benefits because gaining insight into our DNA makeup allows medical scientists to accurately predict our susception to future illnesses and/or chronic conditions before they affect us. Not only does this buy time for research labs and pharmaceutical companies to discover potential cures and/or and rollout counteracting medicines, it empowers individuals to take measures to prevent/ delay the impact of said conditions, thus increasing their life expectancy rates and/or improving their quality of life.

Buying time for the researchers and pharmaceuticals

Since Covid-19 hit the headlines, genomic sequencing has emerged as a key tool in efforts to contain the disease, as health experts the world over have grappled with the inevitability of continued transmission until a) we reach “herd immunity” or b) entire populations are vaccinated.

The UK are world leaders in this space. Indeed, the Department of Health and Social Care published a press release at the beginning of October announcing that over one million SARS-Cov-2 genome sequences have been uploaded to the GISAID database (an international initiative set up to share info pertaining to avian influenza), accounting for nearly a quarter of all sequences published globally to date.

The GISAID database stores sequences from across the globe – and believe me, that’s a heck of a lot of data. The discipline has become so integral to advances in medical science in this country that the Government has established an entire department dedicated to genomic sequencing, Genomics England.

Too much knowledge can be counterproductive

On the flip side you could argue that genomic sequencing is accelerating to the point of becoming intrusive. According to an article in the Guardian, genetic detective work in Victoria, Australia, traced the second Delta variant outbreak down to one (very unfortunate) family, and the sequencing evidence significantly influenced the state’s subsequent quarantine measures. I can only imagine that family’s grief realising they were the eye of the Covid storm in their neighbourhood…

Given that example, there is understandable concern that too much insight is dangerous, leading to unnecessary anxiety which impacts mental health and aspirations for cures for conditions that are not currently treatable.

Genomic sequencing is data intensive

Regardless of our opinions on medical advances, social ethics etc, one thing we know for sure is that genomic sequencing facilities around the globe are generating data in unfathomable quantities. And this begs the question as to how and where we supposed to analyse and store this highly sensitive information? Capacity was initially managed by subdividing DNA samples into bitesize chunks, (exomes etc). However, as technologies and processes have evolved and storage has become more sophisticated, it’s now possible to analyse DNA in its entirety but it comes with obvious compute and bandwidth consequences.

As with all computer-generated research, genomic sequencing is data-intensive because millions of samples must be analysed to obtain meaningful patterns and trends. But how does this high-volume data impact data centres? Not only is the keyed-in data loaded with information, when you combine this with AI and machine learning (also data heavy) to speed up processes and outcomes, the IT load needed is huge. And the more we utilise genomic sequencing for medical research, the more data will be produced.

This is where future-ready campuses and colocation facilities like ours come into play. Equipped with all-fibre networks for swift processing and interconnectivity via secure connections in line with personal data protection laws, we are able to provide the robust, high-density colocation infrastructure needed to drive research in this field. We are seeing more and more demand from life sciences organisations to host power-hungry hardware in the data centre and given this we’ve been fast to move from a one-campus solution to a multi-site offering across three London-perimeter locations that together offer more than 55MW of capacity.

With NVIDIA Cambridge-1 already at our Harlow headquarters, along with the European Bioinformatics Laboratory (EMBL-EBI) and smaller startups using AI and machine learning to explore niche bio-computational workloads including genetics, we’re well placed to support the UK’s continuing push to be a world leader in genetic sequencing.

Tom Bethell

Tom Bethell is one of Kao Data's Business Development Directors. With a background in IT Infrastructure; Tom has been working specifically within the areas of data centre colocation and high performance computing for a number of years.



Share

Other articles

25 Nov 2021

Removing the computational barriers to healthcare advancement

Read more
9 Nov 2021

Data centre operators must weather the energy crunch

Read more
26 Oct 2021

Why 'Incident Reporting' is crucial to the data centre industry

Read more
Get a quote