Defining the real AI data centre
What's an AI data centre? It's always been a mystery to me because any data centre can run AI.
What's an AI data centre? It's always been a mystery to me because any data centre can run AI.
Everyone is building an AI data centre these days.
- Microsoft's US$2.2B cloud and AI investment in M'sia.
- YTL building an AI data centre with Nvidia in Johor.
- Singtel's AI-ready data centres in the region.
Read about Singtel's plans here.
- AI use set to surge
There's no running away from AI these days. Indeed, OpenAI is also expected to launch a new capability later today that CEO Sam Altman says will "feel like magic".
Love it or hate it, AI use is projected to surge.
Until smaller AI models start running on smartphones, on edge networks, and in appliances around the home - and we are moving towards these - we need data centres for AI.
So what exactly is an AI data centre again?
- The 'AI-ready' data centre
We know that AI runs on GPUs or specialised chips optimised for parallel processing. These require substantial power, the more so due to the many chips deployed together.
- GPUs typically consume more power than server CPUs.
- GPUs also tend to be packed 8 per server or more.
- A rack of GPUs will hence consume more power.
For instance, Nvidia's DGX H100 with 8x GPUs is rated at 10.2kW, which is pretty high considering the average global rack density is < 6kW per rack in 2023 (Uptime).
The power needed for AI processing is the reason experts have expressed concern about its sustainability in the long run.
- A dance of configuration
For the immediate term though, it is worth noting that data centres are not immutable facilities but are designed with a degree of flexibility.
For example:
- Fan walls can be installed.
- Power can be diverted from other racks.
- Additional heat exchangers can be installed.
Assuming a good price, data centre operators will want to secure the business, hosting less low-powered racks in favour of AI workloads.
So unless it's badly out-of-date, every data centre can conceivably support AI - it's simply a dance of economics and upgrading.
- Defining the AI data centre
Today, AI readiness is defined by the kW per rack of a facility. But that's a limited view for the reasons I outlined above.
Moreover, GPU power requirements keep increasing and a "high" kW rating today might be mediocre tomorrow.
To properly identify data centres optimised for AI, perhaps we should flip everything around and put the spotlight on factors such as energy and space use instead.
And ask questions such as:
- How many GPUs can we run per kW? (Efficiency)
- How many GPUs deployed per sq ft? (Land space)
What do you think? Look forward to hearing your thoughts here.