AI training now happens in purpose-built facilities that look and operate differently from traditional data centers. Here is how AI data centers work.

The Basic Shift

Traditional data centers served web traffic and databases — millions of small, independent requests. An AI data center is built for the opposite: massive, highly-coordinated computations running across thousands of chips simultaneously to train or infer from large models.

GPU-First Architecture

The core compute

NVIDIA H100 and B200 GPUs, AMD MI300 series, or custom chips (Google TPU, AWS Trainium, Microsoft Maia) are the main processors. Each unit costs $25,000-$50,000+.

A training cluster for a frontier model uses 10,000-100,000+ GPUs running for weeks.

Networking matters as much as compute

Training requires high-bandwidth, low-latency communication between GPUs. NVIDIA NVLink connects GPUs within a server; InfiniBand or custom Ethernet fabrics connect servers into racks and clusters.

If the network is slow, GPUs sit idle — the most expensive waste possible.

Power and Cooling

Power density

A modern AI data center can use 20-80 kilowatts per rack, vs 5-10 kW for a traditional data center. A 1 GW AI facility powers roughly 800,000 US homes.

Liquid cooling

Air cooling can't remove 80 kW per rack. Direct-to-chip liquid cooling (cold plates) or immersion cooling has become standard for new builds.

Proximity to cheap power

AI data centers are being built near hydro, nuclear, and natural gas plants to minimize power costs. The location shift is driving utility investment cycles.

Scale and Cost

CapEx

A 1 GW AI data center costs $40-60 billion to build, including land, construction, power hookups, and filled with GPUs.

Depreciation

GPUs depreciate 20-30% per year due to Moore's-Law-like cadence. Operators amortize over 4-5 years typically.

OpEx

Power is the dominant operating cost — often exceeding chip depreciation over a 4-year horizon.

Who Runs AI Data Centers

Hyperscalers

Microsoft, Google, AWS, Meta operate the largest AI data centers for internal use and cloud customers.

Dedicated AI cloud providers

CoreWeave, Nebius, Crusoe, Lambda Labs specialize in GPU-as-a-service.

Model labs

OpenAI, Anthropic, xAI, Mistral, DeepSeek run large training runs, often co-located with hyperscaler infra.

Sovereign / national projects

France, UAE, Saudi Arabia, Japan are investing in domestic AI compute.

Supply Chain

GPU supply: NVIDIA ships ~80% of AI accelerators; allocated by demand
HBM memory: Samsung, SK Hynix, Micron control high-bandwidth memory used in GPUs
Advanced packaging: TSMC's CoWoS packaging is a bottleneck
Power equipment: transformers, UPS, switchgear have 18-36 month lead times

Training vs Inference

Training data centers

Highly clustered compute
High-bandwidth internal network critical
Used for weeks or months at near-100% utilization
Large single customer per cluster

Inference data centers

More distributed (edge-deployable for latency)
Lower networking requirements
Utilization varies by user demand
Multi-tenant

Future AI build-out will likely split these further.

Stocks to Watch

Covered in our [/topic/ai-infrastructure](/topic/ai-infrastructure) feed:

NVIDIA (GPUs)
Broadcom (custom AI ASICs, networking)
TSMC (fab)
ASML (EUV lithography)
Super Micro / Dell (server integration)
Vertiv (data-center infrastructure)
Eaton, Schneider (power equipment)
Constellation Energy (power supply)

Key Takeaways

AI data centers are purpose-built for dense GPU compute
Power density 5-10x traditional data centers
Network bandwidth between GPUs is as critical as compute
CapEx per facility is $40-60B for frontier builds
Supply chain bottlenecks: GPUs, HBM memory, advanced packaging, power equipment

Browse live AI infrastructure news at [/topic/ai-infrastructure](/topic/ai-infrastructure).

What Is an AI Data Center? Inside the GPU Factories Powering Modern AI