The Basic Shift
Traditional data centers served web traffic and databases — millions of small, independent requests. An AI data center is built for the opposite: massive, highly-coordinated computations running across thousands of chips simultaneously to train or infer from large models.
GPU-First Architecture
The core compute
NVIDIA H100 and B200 GPUs, AMD MI300 series, or custom chips (Google TPU, AWS Trainium, Microsoft Maia) are the main processors. Each unit costs $25,000-$50,000+.
A training cluster for a frontier model uses 10,000-100,000+ GPUs running for weeks.
Networking matters as much as compute
Training requires high-bandwidth, low-latency communication between GPUs. NVIDIA NVLink connects GPUs within a server; InfiniBand or custom Ethernet fabrics connect servers into racks and clusters.
If the network is slow, GPUs sit idle — the most expensive waste possible.
Power and Cooling
Power density
A modern AI data center can use 20-80 kilowatts per rack, vs 5-10 kW for a traditional data center. A 1 GW AI facility powers roughly 800,000 US homes.
Liquid cooling
Air cooling can't remove 80 kW per rack. Direct-to-chip liquid cooling (cold plates) or immersion cooling has become standard for new builds.
Proximity to cheap power
AI data centers are being built near hydro, nuclear, and natural gas plants to minimize power costs. The location shift is driving utility investment cycles.
Scale and Cost
CapEx
A 1 GW AI data center costs $40-60 billion to build, including land, construction, power hookups, and filled with GPUs.
Depreciation
GPUs depreciate 20-30% per year due to Moore's-Law-like cadence. Operators amortize over 4-5 years typically.
OpEx
Power is the dominant operating cost — often exceeding chip depreciation over a 4-year horizon.
Who Runs AI Data Centers
Hyperscalers
Microsoft, Google, AWS, Meta operate the largest AI data centers for internal use and cloud customers.
Dedicated AI cloud providers
CoreWeave, Nebius, Crusoe, Lambda Labs specialize in GPU-as-a-service.
Model labs
OpenAI, Anthropic, xAI, Mistral, DeepSeek run large training runs, often co-located with hyperscaler infra.
Sovereign / national projects
France, UAE, Saudi Arabia, Japan are investing in domestic AI compute.
Supply Chain
- GPU supply: NVIDIA ships ~80% of AI accelerators; allocated by demand
- HBM memory: Samsung, SK Hynix, Micron control high-bandwidth memory used in GPUs
- Advanced packaging: TSMC's CoWoS packaging is a bottleneck
- Power equipment: transformers, UPS, switchgear have 18-36 month lead times
Training vs Inference
Training data centers
- Highly clustered compute
- High-bandwidth internal network critical
- Used for weeks or months at near-100% utilization
- Large single customer per cluster
Inference data centers
- More distributed (edge-deployable for latency)
- Lower networking requirements
- Utilization varies by user demand
- Multi-tenant
Future AI build-out will likely split these further.
Stocks to Watch
Covered in our [/topic/ai-infrastructure](/topic/ai-infrastructure) feed:
- NVIDIA (GPUs)
- Broadcom (custom AI ASICs, networking)
- TSMC (fab)
- ASML (EUV lithography)
- Super Micro / Dell (server integration)
- Vertiv (data-center infrastructure)
- Eaton, Schneider (power equipment)
- Constellation Energy (power supply)
Key Takeaways
- AI data centers are purpose-built for dense GPU compute
- Power density 5-10x traditional data centers
- Network bandwidth between GPUs is as critical as compute
- CapEx per facility is $40-60B for frontier builds
- Supply chain bottlenecks: GPUs, HBM memory, advanced packaging, power equipment
Browse live AI infrastructure news at [/topic/ai-infrastructure](/topic/ai-infrastructure).