31 Jul 2020
In my last blog I discussed how Digital Twins could help guide the complex and intricate processes of designing and building next-generation data centres. In future, I believe industry professionals will leverage AI simulations to both test and optimise the data centre over its lifecycle.
As an operator of advanced data centres for the UK Innovation Corridor, Kao Data view two concurrent pathways leading us into the future; firstly the building or infrastructure and secondly the data centre, along with its compute power and connectivity. These paths do not run parallel, but touch, overlap and interweave, especially in recent times as we see the convergence between the ‘edge’ and ‘core’ of compute.
Today our sector must provide expert understanding of the most advanced computing system requirements needed to host and power AI applications. Externally. This is driven by low-latency connectivity and high-capacity dark fibre routes that distribute the data exactly as and when it’s required. Internally, high throughput data networks must include node-to-node inter-connectivity using networking such as Infiniband from Mellanox/NVIDIA for direct support of HPC, linking the clusters required for parallel processing workloads.
What has become apparent is that within HPC, one-size doesn’t fit all. Customers with HPC based applications need customisable architectures that are future-proofed to flex and scale as the hardware and server densities change. GPUs, for example, are further supplemented to boost performance and storage requirements will evolve. Power generation has to meet the demand for the most intensive forms of AI, such as deep neural networks, machine vision and natural language translation. This surge in energy also requires a more synergistic cooling technology; able to cope with significant increases in heat from the latest generation of processors and their associated electronics. Direct-to-chip liquid cooling may-well become the norm and data centres will need to be ‘plumbed’ to cater for this transition.
The Uptime Institute recently stated the “average PUE ratio for a data centre in 2020 is 1.58, only marginally better than 7 years ago.” So at a time when average industry Power Usage Effectiveness (PUE) ratings appear to have plateaued, clearly we must pay closer attention to the changing needs of our customers, ensuring a keen eye is kept on potentially escalating energy use and carbon emissions.
Situated in the heart of the London-Stansted-Cambridge Innovation Corridor – one of the UK’s hotbeds for HPC and AI, many of our customer conversations revolve around three main narratives:
Firstly, HPC and GPU powered AI, which requires specialist compute capabilities, that are exceptionally power hungry and reliant on additional infrastructure technologies. They need specialist interconnect, dedicated server or direct-to-chip cooling and extreme storage such as Hierarchical Storage Management (HSM) to support the high throughput needs of applications whilst optimising the cost of the overall system.
Secondly, legacy data centres, which make up 95% of currently available UK facilities and were not designed to support HPC compute and its infrastructure. Most were designed for low density enterprise servers generating >10kW per rack, which in comparison are ‘plug and play’ to HPCs ‘bespoke’ needs and 50-80kW per rack. Many traditional data centres are dedicated to mechanically chilled, air-cooled strategies, which are expensive to run and inefficient at cooling HPC environments.
Thirdly, specialised data centres, inspired by hyperscale and OCP-Ready™ infrastructure; providing slab floors for heavier compute-dense servers, wide access data halls with no columns and no step access to optimise room layout. Here overhead power and connectivity infrastructure offers customisation, whilst highly efficient, industrial scale direct-to-chip liquid and hybrid air-cooling maximises heat extraction and efficient energy use, thereby reducing OpEx.
AI applications require bigger, hotter chips that can change the form-factor of the servers, and impacts the designs of racks, chassis and enclosures. These processors consume up to 50kW per server. Consider that against a recent AFCOM survey, which found the averaging data centre power load was 7.3kW per processor rack! To overcome this challenge, at Kao Data, we are already able to provide options (via our Technology Pods) to provide 80kW per server and deploy liquid cooled infrastructure as an advanced option for our HPC customers.
This change in processor power and the increase in energy usage, demonstrates the need for data centre operators to collaborate with key industry organisations, such as ASHRAE TC9.9, the Open Compute Project and (OCP) and the Infrastructure Masons.
Chip and GPU manufacturers such as AMD, Intel and NVIDIA, are also members of these organisations and contribute to the development of guidelines that form the basis of much of the best practice in our industry. Involvement in these committees provides intricate insights and a detailed understanding of the future road maps, including the capabilities required from data centres to drive optimisation in the most effective and efficient environments.
In my opinion, being involved in key industry committees ensures that Kao Data remains on the crest of the wave, first to gain insight on where the technology is heading and our campus design can continually be aligned to ensure technical excellence. Thanks to a lot of hard work and hyperscale inspired design from my friend and colleague Gerard Thibault, (CTO, here at Kao Data), it has the structural, technical and infrastructure capabilities to accommodate and evolve with the future requirements of HPC and AI.
This blog was featured as a guest article on Computer Weekly in early July 2020. Click here to read it.