Nodvolt | The Custom Silicon Revolt: Why Every Hyperscaler Is Now Designing Its Own AI Chip

June 19, 2026

NVIDIA's data centre revenue trajectory through 2024 and 2025 was a function of two simultaneous forces: extraordinary demand from hyperscalers building AI training and inference infrastructure, and the absence of any credible alternative to the H100, H200, and Blackwell B200 GPU at the performance tier those customers required. The second condition is now eroding. Every major hyperscaler has a custom AI silicon programme in production at meaningful volume. The question for the next two years is not whether custom ASICs will reach scale — they already have — but what share of the AI infrastructure dollar they will capture relative to merchant NVIDIA GPUs.

The catalyst was simple economics. When NVIDIA captures 75 to 80% gross margin on a Blackwell GPU configuration and its customers face HBM and CoWoS supply constraints that prevent them from receiving the unit volumes they want, the unit economics of designing your own chip and renting TSMC capacity directly become favourable above approximately 300,000 to 500,000 annual unit volume. All four US hyperscalers — and several Chinese cloud providers and AI labs — now generate AI workload demand above that threshold.

The four programmes and where each stands

The current state of the major hyperscaler custom silicon programmes is no longer speculative. Each programme has shipping silicon, qualified workloads, and a published roadmap to the next generation.

Google TPU

TPU v7 (Trillium successor) in volume; v8 in tape-out

TSMC N3 → N2 transition

Est. 2.5–3M units/yr

AWS Trainium / Inferentia

Trainium 2 at scale; Trainium 3 sampling Q3 2026

TSMC N3

Est. 1.5–2M units/yr

Microsoft Maia

Maia 200 deployed in Azure; Maia 300 in development

TSMC N3 / N3P

Est. 0.6–1M units/yr

Meta MTIA

MTIA v2 in production for inference; v3 in development

TSMC N5 → N3 transition

Est. 0.8–1.2M units/yr

The volumes above are estimates derived from supply chain interviews, foundry allocation disclosures, and inference fleet sizing benchmarks. The absolute numbers should not be over-interpreted, but the order of magnitude is now unambiguous: combined hyperscaler custom ASIC production in 2026 will exceed five million units, comparable to NVIDIA's projected H200 and Blackwell unit volume in the same period. Custom silicon is no longer a side bet.

"Every hyperscaler has crossed the threshold where the engineering cost of designing their own AI chip is less than the margin they pay NVIDIA on equivalent capacity."

What custom silicon optimises for that NVIDIA cannot

The custom ASIC programmes are not attempting to outperform NVIDIA on general-purpose AI workloads. They are attempting to optimise specifically for the inference and training patterns that dominate each hyperscaler's actual production workload — and to do so at a cost-per-inference structure that NVIDIA's gross margin will never match.

Google TPU v7, manufactured on TSMC N3, is optimised for the transformer attention patterns and matrix-vector operations that dominate Gemini family inference. Google's own Search, Gmail, Google Cloud Vertex AI, and Workspace AI features run primarily on TPU rather than NVIDIA GPU. AWS Trainium 2 is optimised for the model training patterns of customers running Anthropic, Stability AI, and AWS's own Titan family. Microsoft Maia 200 is co-designed with the Azure AI Foundry workload mix that includes OpenAI's GPT model serving and Microsoft 365 Copilot inference. Meta MTIA v2 is optimised specifically for the recommendation model inference that drives Instagram and Facebook feed ranking — the highest-volume inference workload in the world.

In each case, the custom silicon achieves substantially lower cost-per-inference than NVIDIA GPU for the specific workload it was designed around, even when its peak compute throughput is below the comparable NVIDIA part. The relevant metric is not FLOPS per chip. It is dollars per million inferences served at production latency. On that metric, every hyperscaler custom ASIC programme has achieved 40 to 70% cost reduction versus running the same workload on NVIDIA hardware.

Estimated cost per million inferences — Llama-class 70B model serving (USD)

Source: Nodvolt Intelligence estimates from cloud provider pricing arbitrage, customer interview data

NVIDIA H100 baseline

$1.00 (index)

NVIDIA Blackwell B200

~$0.68

Google TPU v7

~$0.42

AWS Trainium 2

~$0.48

Meta MTIA v2

~$0.36

Why Broadcom and Marvell are quietly the biggest winners

The companies that have benefited most from the custom silicon shift are not the hyperscalers themselves. They are the ASIC design partners that translate hyperscaler workload requirements into manufacturable silicon. Broadcom Inc., USA has built a custom silicon practice that includes the Google TPU programme (Broadcom has been a TPU co-design partner since the original 2015 generation), Meta MTIA design services, and several other unnamed hyperscaler programmes. Marvell Technology has a similar practice that has included co-design work for AWS Trainium and Microsoft Maia.

Broadcom's AI semiconductor revenue, which includes its custom ASIC design services and the related networking silicon that connects multi-chip systems, grew from approximately USD 4 Billion in FY2023 to a projected USD 24 to 28 Billion in FY2026, an acceleration that has outpaced NVIDIA's revenue growth on a percentage basis. Broadcom CEO Hock Tan has guided that the company's custom AI silicon serviceable market will reach USD 60 to 90 Billion by 2027, an estimate that implicitly assumes the hyperscaler custom ASIC trajectory continues to accelerate.

Nodvolt Intelligence View

Broadcom and Marvell are operating as the unbundled IP and engineering services layer for hyperscaler custom silicon in much the same way that ARM Holdings operates as the unbundled architecture layer for mobile processors. Neither company manufactures the silicon. Both capture revenue per chip designed without taking the margin compression that comes with operating a foundry.

For investors, this positioning is structurally more defensible than NVIDIA's at the current point in the AI infrastructure cycle. Broadcom's revenue grows as more hyperscalers commission more custom silicon variants. NVIDIA's revenue grows only if NVIDIA maintains share against an expanding base of internally-developed alternatives.

What this means for NVIDIA's 2027 thesis

The bull case for NVIDIA through 2027 rests on two assumptions: that the merchant AI accelerator market continues to grow at 30 to 40% CAGR, and that NVIDIA retains 70 to 80% revenue share of that market through the Rubin and Rubin Ultra generations. The first assumption is reasonable. The second is where the custom silicon dynamic creates real downside risk.

NVIDIA's share through the Hopper and Blackwell generations was sustained by three advantages: superior peak compute throughput, the CUDA software ecosystem moat, and the NVLink interconnect that enables scale-out training cluster configurations no competitor matched. The first advantage is preserved through Rubin. The second advantage is eroding as hyperscaler customers internalise their AI workloads onto custom silicon stacks where they do not need CUDA. The third advantage is more durable but is being approached by AMD's Infinity Fabric and by emerging open standards including UCIe and Ultra Ethernet Consortium specifications.

The most important variable is the workload mix. If 2027 AI compute consumption shifts further toward inference rather than training — which both Jensen Huang and the broader hyperscaler executive class have publicly forecast — the share of total compute that can be efficiently served on hyperscaler custom silicon will increase rather than decrease. Training workloads benefit from NVIDIA's peak throughput. Inference workloads benefit from cost-per-inference optimisation, which is what custom silicon delivers.

"The custom silicon programmes are not designed to defeat NVIDIA in benchmarks. They are designed to take inference workload off NVIDIA hardware where the unit economics favour vertical integration."

The Chinese parallel and the broader pattern

The same dynamic is reshaping the Chinese AI infrastructure market, with the additional pressure of US export controls that restrict access to NVIDIA's most advanced GPUs. Alibaba's T-Head, Baidu's Kunlun, and Huawei's Ascend programme are all custom silicon programmes operating against the same economic logic as their US hyperscaler counterparts, with the added structural advantage that they face no NVIDIA competition above defined performance thresholds in their home market.

Alibaba's T-Head 9000 series has reached production volume on Chinese domestic foundry processes. Baidu Kunlun II is now in third-generation production. Huawei Ascend has captured what NVIDIA CEO Jensen Huang recently called "largely conceded" share of the Chinese AI training market. The pattern across all of these programmes — US and Chinese — is the same: hyperscalers reaching the scale where designing their own AI silicon delivers better unit economics than buying merchant GPUs.

What procurement teams should change

The practical implications for procurement teams at non-hyperscaler enterprises are significant. The narrative that "AI infrastructure means NVIDIA GPUs" was substantially true through 2024 and 2025. It is now substantially incomplete. Enterprise AI infrastructure decisions in 2026 and 2027 will increasingly involve choices between:

NVIDIA GPU on-premise or via cloud GPU services, which preserves the CUDA ecosystem and supports the broadest range of training and inference workloads but carries the full margin stack of merchant silicon.

Hyperscaler custom silicon via cloud services, which captures the cost-per-inference advantage of the custom programmes but requires workload portability across each hyperscaler's proprietary software stack and locks the workload into that hyperscaler's cloud environment.

Specialty AI inference accelerators from Groq, Cerebras, SambaNova, Tenstorrent, and others, which offer differentiated inference economics for specific workload classes but carry ecosystem maturity risk and customer concentration risk.

The right choice depends on the specific workload mix, the durability of the hyperscaler vendor relationship, and the strategic value of avoiding vendor lock-in to any single AI compute provider. The decision framework has become substantially more complex than it was in 2024, and the cost of making the wrong choice has increased proportionally with the absolute AI infrastructure spend.

Featured Analysis Hyperscaler Custom ASIC AI Accelerator 12 min