Understanding the role of water in data centres

Why almost every data centre requires massive quantities of water to operate.

Understanding the role of water in data centres
Photo Credit: Paul Mah. AMD's 2nd-gen Epyc server with DTC from 2019. Note the pipes channeling water to each CPU.

Did you know that almost every data centre today requires massive quantities of water to operate?

Humans can't do without water. And neither can data centres that are deployed in most parts of the world.

Water is the lifeblood of modern data centres, used to dissipate the heat generated by powerful servers and GPUs in these facilities.

The role of water in air-cooling

Here's how it works:

  • Powerful cooling towers or chillers cool water to a low temperature. This is circulated to various data halls within the data centre.
  • Within data halls, CRAHs - think of them as scaled-up 'air cons' - use this cold water to cool the air in the room.
  • In each server, dozens of high-speed fans suck in cool air from the front and expel hot exhaust from the rear. This is then cooled by CRAHs.

And that's how typical data centres were designed for years. We call this air-cooling.

Need for liquid cooling

Air cooling worked where rack-level power density stayed low, with a global average of 8kW per rack per a 2023 Uptime survey.

However, this is no longer the case in the era of AI. A full rack of 5x Nvidia DGX H100 servers requires 50kW per rack. Few data centres can handle this.

The solution is to bring the chilled water closer to the rack - liquid cooling, leveraging its superior heat capacity to effectively keep systems cool.

The 3 key types of liquid cooling are:

  • Rear Door Cooler (RDC) - Chiller water to each rack.
  • Direct to chip (DTC) - Chilled water to CPUs and/or GPUs.
  • Immersion cooling - Immersion in a liquid (not water).

Why aren't we just implementing them already, instead of talking about it? I'm glad you asked.

Pros and cons

There are various challenges to liquid cooling.

  • Immersion requires significant prep work. Fans to remove, firmware to flash, thermal paste to change, the list goes on.
  • DTC is simpler but does require a complex network of pipes that goes into individual servers. And if it leaks even a bit...

I've always thought of RDC as a compromise liquid cooling option that's not as good as DLC or immersion. However, I recently realised there are solutions in the market that can support over 90kW per rack.

It isn't perfect, but RDC could be a good fit for many use cases today (and tomorrow).

I'll dive further into RDC in another post tomorrow. Stay tuned!