AI Compute New Infrastructure White Paper: Rise of Liquid Cooling, Optical Interconnects, and Specialized Chips
Preface:
In 2025, anyone walking into a newly built data center would be shocked by the scene: no roar of fans, no dense forest of network cables.
Instead, servers quietly boiling submerged in fluoride liquid, and laser signals flashing between racks.With the exponential growth of large model parameters, the compute bottleneck has shifted from "Calculation" to "Interconnect" and "Heat Dissipation." This article delves into the physical layer, dismantling the hardware foundation supporting the AI 2.0 era.
Chapter 1: Interconnect Wall: The Inevitability of Optics Replacing Copper
In the H100 era, we still used copper cables (DAC) to connect GPUs within a rack. But today, with trillion-parameter model parallel training, the physical limit of copper cables has been breached.
1.1 The Explosion of Silicon Photonics
In 2025, CPO (Co-Packaged Optics) technology finally matured for mass production.
- Principle: Previously, optical modules were plugged into switch panels, tens of centimeters away from the chip, causing huge signal loss during transmission. CPO technology packages the Optical Engine directly on the GPU chip substrate.
- Benefits:
- Power Reduction 50%: Signals no longer need to travel long distances.
- Bandwidth Density Boost: Single chip IO bandwidth breaks 51.2 Tbps, thoroughly solving the Memory Wall problem of "fast calculation, slow transmission."
1.2 All-Optical Switching Network
Google's Jupiter data center architecture showed the future direction: OCS (Optical Circuit Switches).
- Traditional electrical switches need to convert optical signals to electrical, process them, and convert back to optical (O-E-O), which is high latency and power-consuming.
- OCS uses tiny mirrors in MEMS to directly reflect light beams for routing. Light in, light out, zero latency, consuming no signal energy.
Chapter 2: Heat Dissipation Revolution: From Air to Liquid
When single chip TDP (Thermal Design Power) breaks 1000W (like Blackwell B200), air cooling heatsinks made as big as bricks still can't suppress the heat.
2.1 Popularization of Cold Plate Liquid Cooling
This is currently the mainstream transitional solution.
- Scheme: Attaching a copper water block closely to the GPU surface, with coolant circulating in pipes to take away heat.
- Challenge: Leakage risk. Once coolant leaks, the whole machine is scrapped. Thus, Negative Pressure Systems appeared in 2025—pressure inside pipes is lower than outside, so even if ruptured, air sucks in rather than liquid flowing out.
2.2 The Endgame of Immersion Cooling
This is the real future.
- Single-Phase Immersion: Servers completely submerged in insulating oil, using natural convection of liquid for heat dissipation.
- Two-Phase Immersion: Servers submerged in fluoride liquid. Liquid boils upon heating turning into gas (phase change takes away huge latent heat), gas rises to the lid, condenses back to liquid and drops.
- PUE (Power Usage Effectiveness): Traditional air-cooled PUE is about 1.5, two-phase immersion can reduce PUE to 1.02. This means almost all electricity is used for calculation, not air conditioning.
Chapter 3: Chip Architecture: Counterattack of ASIC
GPUs are general-purpose, but in inference, general-purpose means waste.
3.1 Wafer-Scale Engine
Cerebras takes an extremely radical route: Not cutting the wafer.
- Traditional chips are small pieces (Die) cut from a wafer. Cerebras makes the entire 12-inch wafer into one chip, possessing 850,000 cores.
- Advantage: Communication between cores is entirely done inside the chip, with bandwidth thousands of times that of GPU interconnects. This allows it to achieve extreme low latency of Batch Size = 1 when processing super-large model inference.
3.2 Processing-in-Memory (PIM)
The original sin of Von Neumann architecture lies in the separation of computing and storage units. Data movement between the two consumes 90% of power.
- PIM Technology: Integrating simple computing logic directly inside DRAM memory granules. Compute where the data is.
- Application: Very suitable for AI basic operations like matrix multiplication. Although precision is lower, it has huge potential in edge inference scenarios.
Chapter 4: Green Computing: The Straitjacket of Carbon Emissions
AI is a power guzzler. In 2025, energy acquisition capability became the primary factor for data center site selection.
4.1 Follow the Source
Data centers are migrating from first-tier cities to Inner Mongolia, Guizhou, and even Iceland.
- Wherever there is cheap wind or hydro power, compute is built there.
- Microsoft even attempts to build data centers under the sea (Project Natick), utilizing infinite seawater for heat dissipation.
4.2 Heat Recovery
Data centers in Europe began to take on heating tasks.
- Since AI chips generate so much heat, why not collect this waste heat to warm surrounding residential communities? This not only reduces carbon emissions but also creates extra economic revenue.
Conclusion
The competition of compute infrastructure has evolved into a comprehensive race of material science, fluid dynamics, and optics.
In this arms race, there is no "performance surplus." Because software (models) devours compute greedily; every hardware progress will be instantly filled by larger, smarter models.
This document is written by the Hardware Group of the Augmunt Institute for Frontier Technology, based on global semiconductor supply chain surveys in 2025.
