16 Nov 2023
Two phrases that have become inextricably linked over the past two years are HPC (High Performance Computing) and AI (Artificial Intelligence). Both depend on the development of one another to enable growth in their retrospective markets. Yet, under the surface, each has specific technological requirements and drivers for change.
Ahead of SC 2023 – the international conference for HPC, networking, storage, and analysis - Intersect360’s latest market update featured an updated HPC-AI forecast, and highlights from its newest technology survey. The session shared key insights from 163 qualified end-users and an emphasis on areas of advanced computing undergoing rapid transition.
With a wide range of findings, the respondents discussed market dynamics such as the growth and accessibility of GPUs, the demand for cloud and Generative AI, and changes in cooling architectures – many of which are areas of specific interest to our customers.
In this blog, we’ll dive into some of the insights, sharing what they predict for the future, and whether infrastructure must be re-designed to provide a scalable, sustainable and high performance platform for AI.
While Dell and HPE remain the top two server vendors in terms of market share, it was not unexpected to see that NVIDIA’s server business is still growing, and its strong brand recognition has meant that many end-users have begun to name NVIDIA as a preferred vendor within their systems, even when using other OEM’s servers.
Our first Harlow data centre, is home to NVIDIA’s Cambridge-1 supercomputer - which in recent months has become part of NVIDIA DGX Cloud - and it was interesting to note the research found that the proportion of CPU usage for HPC and AI applications has shown rapid growth in NVIDIA’s Grace Superchip over the next three years. NVIDIA Grace is positioned as a breakthrough CPU for the the data centre, helping to accelerate the largest AI, HPC, cloud, and hyperscale workloads, and is expected to become the dominant ARM-based CPU solution as we move forward.
AMD, however, continues its growth trajectory, with the appearance of the AMD APU (Accelerator Processor Unit). Yet when compared to ARM’s solutions, there appears to be little market interest in RISC-V - an open standard Instruction Set Architecture (ISA) enabling processor innovation through open collaboration.
The findings should also sound alarm bells for Intel, as respondents indicate that although users have a high regard for both Intel and AMD CPUs, Intel’s market opportunity could fall below 50%, with much of the share lost to AMD and NVIDIA Grace.
With power comes great responsibility.
Generative AI (Gen AI) is without a doubt the flavour of the month and a huge driver of growth not just for the GPU market, but for data centres too. 46% of the survey’s respondents stated they are engaged with generative AI today, and 37% state they’re looking to build their own Gen AI models.
The analysts also noted, that building large language models (LLMs) is not a task for the timid and requires significant computational power whether on-premise or in the cloud. Interestingly HPE has also expanded its GreenLake offering, allowing customers to run HPC-focused machine learning applications in the cloud, without building or managing on-premise supercomputers.
The survey also asked, why users are looking to invest in Generative AI? And one of the key benefits being sought is an increase in employee productivity, with 45% of respondents agreeing with that sentiment.
Step on the gas.
The use of accelerators is fast-gaining popularity, with 89% of respondents stating they have leveraged accelerators within their HPC and AI environments.
The accelerator market is still dominated by NVIDIA, but as with any expanding market, new entrants are chipping away to try and gain market share. According to Intersect, NVIDIA has seen its accelerator demand drop from 92% to around 87%, and the analysts feeling is that there are further falls to come - with current projections suggesting a fall to 79%.
In particular, AMD’s share of the market, which is currently at 5.3% based on its GPU offering, could reach over 12% in the near future.
A let up or let down?
With more device manufacturers adding accelerators to nodes, another question asked was why users aren’t getting what they want out of their accelerators?
The findings stated 19% of accelerators are not highly utilised, and a further 57% stated that the accelerators are still not used as expected.
What came as a surprise to the analysts was that HPC and AI utilisation across all industries was around 76%, which is low considering the investment that goes into such systems and the rate at which the market is predicting adoption rates to grow.
When discussing accelerators and GPUs per node, the analysts also observed the need to look at the dynamics of HPC and AI. For example, classic scientific and deterministic computing applications that are accelerated, aren’t optimised for more than one or two GPUs per node.
For Machine Learning (ML) applications, eight GPUs per node is not unusual, however, according to the analysts, four GPUs per node remains the most common and balanced configuration – although some may consider this a compromise which offers too many processors for HPC and not enough for AI.
Moreover, the analysts believe that AI will become a more pervasive influence rather than being a ‘separate monolithic island’, yet what is apparent is that large volumes of processor nodes generate high heat outputs, which will require specialist cooling solutions moving forward.
Hot chips become liquid cooled.
Fast processors, and more of them, has seen rack temperatures rise to a level where air cooling is no longer effective. Liquid cooling is becoming increasingly a hot topic – excuse the pun – and 64% of respondents stated, they’re convinced they will transition to liquid cooling’, with a further 36% stating they’re already planning a pilot.
68% also recognise they need to significantly upgrade their facilities to introduce rack-scale liquid cooling – something we’ve also seen in the data centre sector. Therefore, HPC and AI deployments should work with a data centre operator that has advanced design and engineering capabilities, and whom can facilitate direct to chip liquid cooled workloads without the need for modernisation.
What is obvious from the findings is that HPC and AI use is growing at a phenomenal rate, and as such, the data centres that house these intensive computing applications must quickly evolve. This must include not only specialist IT, power and cooling provisions, but the need to provide super-fast connections to the cloud, alongside to access to specialist networks.
At Kao Data, our experience tells us that innovation is never linear in its progression, and as more users embrace the power of advanced workloads, and in greater numbers, the need to transition from legacy platforms to scalable data centres engineered for AI is essential to meet the demands of the future.
Right now, we’re excited to see how the impact of Generative will continue to influence data centre design, and to ensure that whatever comes next, our platform is ready for its arrival.