One of the key puzzles of the Covid-19 pandemic has been why some people experience a life-threatening, or fatal, illness, while others have little or no symptoms. For some, recovery is straightforward, while others are left with lingering or even permanent damage. Is this determined by luck, environmental factors or is it something in the genes of the patient?
Genomics England, set up by the UK government to sequence the genomes of NHS patients with rare conditions, recently announced plans to research the effects of the virus to determine whether high risk patients can be identified and personalised treatments developed. Aiding them in the task will be the cloud computing power of AWS and the expertise of biomedical data processing company Lifebit.
This is just the latest application of genomics, a process of analysing the genetic code of living things, from patients to the viruses infecting them, and using that to unlock new treatments. The capabilities of genomics depend to a large extent on high-performance computing.
DNA is the code that carries the instructions for the development and reproduction of all known organisms and sequencing it is a way of ‘reading’ that code. The first techniques were developed in the late 1970s and the first, complete, sequencing of the entire human genome was finished in 2003. We still understand only a tiny percentage of the blueprints we have uncovered. The function of most genes is unknown.
However, the sequencing taught scientists a lot about the principles of human health and in 2008 the first personal genome was sequenced - allowing researchers to look at tiny variations in our genetic makeup that make us more prone to certain health problems or diseases. Genomics England was set up with the goal of sequencing 10,000 personal genomes.
A virus is made up of either DNA or RNA. All coronaviruses, including Covid-19, are RNA-based, but the principle of sequencing is the same. Scientists analyse the genetic sequence to understand how the virus infects humans and what it does afterwards.
DNA is formed of pairs of four base chemicals, strands of which coil together in a double helix. The human genome contains around three billion base pairs, which is a vast amount of data to analyse. The Human Genome Project took 13 years to complete and cost almost $3bn. In 2017, Intel showed how its Broad-Intel Genomics Stack (BIGstack) could sequence up to five whole genomes per day and at much lower cost. Sequencing is now estimated to cost under $1,000 per genome and the cost is falling.
The reason for that is twofold. First, sequencing methods have improved. Next-Generation Sequencing (NGS)examines small DNA fragments in parallel, checks them multiple times for accuracy, then pieces them together at the end. Second, advances in data storage costs and processing speeds have made it possible to use these techniques at speed and at lower cost.
As genomics projects expand to include cohort and population-level research, Lenovo has begun working with data centers to help them plan their HPC resources for optimal genome sequencing activity.
The speed and cost improvements to genomic sequencing have made it ideal for the Covid-19 pandemic. Previous viruses could not be analysed so quickly, and that is one of the reasons why so many vaccine candidates are already being developed around the world. The next step will be projects like that of Genomics England, to try to understand individual responses to the virus.
The result of this work will be useful for Covid-19 and for future pandemics, but it also represents a vital step forward in developing personalised medicine.