What Boards Get Wrong About Foundation Model Concentration Risk
I have sat in enough board AI briefings to recognize a pattern. The board asks about AI risk. The CTO talks about data privacy, hallucinations, and EU regulations. Everyone nods. The board feels informed. And the single largest structural risk in the organization's AI posture goes completely unaddressed: three companies in San Francisco supply the intelligence layer that now runs inside your most critical business processes, and you have no contractual guarantee, no migration plan, and no visibility into when that changes.
This is foundation model concentration risk. It is not a niche technical concern. It is a board-level governance failure hiding behind the vocabulary of vendor management. The frameworks enterprises use to think about third-party risk were built for a world of deterministic software with contractual SLAs and auditable code. They do not map onto a world where the core intelligence component of your enterprise stack is a non-deterministic statistical function operated by a startup that did not exist ten years ago.
What follows is a precise account of the four misconceptions I encounter most often in board conversations about this risk, the data that corrects each one, and the five governance actions that actually change the exposure. None of them require a technology replacement. All of them can begin this quarter.
Why This Risk Is Different From Every Third-Party Risk You Have Managed Before
Enterprise risk committees are experienced at managing third-party dependency. They assess financial stability, audit security controls, require SOC 2 certifications, and negotiate SLAs. This framework works well for software vendors, cloud providers, and professional services firms. It fails almost completely for foundation model providers.
The failure has three technical roots. First, foundation models are non-deterministic. The same prompt sent to GPT-4o twice will return two different outputs. You cannot write a test suite that proves the model will behave correctly tomorrow, because correct behavior is probabilistic, not guaranteed. Second, providers update models continuously without publishing the kind of change logs that allow downstream systems to assess impact. A model behind an API endpoint today is not the same model that was behind that endpoint six months ago. Third, performance is task-specific. A model that scores highly on aggregate benchmarks like MMLU may perform worse on your specific contract extraction task than a model with lower benchmark scores. There is no standard measure of "better" that transfers reliably from provider marketing material to your production workflow.
These three properties mean that the standard enterprise risk management toolkit, built around auditable, deterministic, contractually bounded software, has no native answer to foundation model dependency. Boards that apply the old framework to this new problem consistently underestimate their exposure.
The Four Misconceptions
This is the most common misconception I encounter, and it is the most dangerous. Azure OpenAI Service does run on Microsoft infrastructure, and Microsoft does provide the enterprise SLA for uptime and data handling. But the underlying model - GPT-4o, GPT-4 Turbo, GPT-3.5 - is OpenAI's intellectual property, trained and controlled by OpenAI. Microsoft does not determine when a model is deprecated. Microsoft does not control what changes OpenAI makes to a model's behavior between versions. Microsoft cannot guarantee that the model your application depends on will exist in 18 months. The Azure relationship gives you enterprise-grade infrastructure around the model. It does not give you control over the model itself.
This statement assumes migration is a routine upgrade, similar to moving from one version of a database to the next. It is not. When a foundation model is deprecated and replaced by a successor, the successor's outputs are not guaranteed to be backward-compatible with the applications built on the predecessor. Every prompt template, every output parser, every downstream validation rule, and every human review workflow calibrated to the previous model's behavior must be re-evaluated and often re-engineered. At enterprise scale - where AI is embedded in dozens of workflows across multiple business units - a forced migration is a multi-month engineering project, not a configuration change. OpenAI gave 6 months notice before deprecating text-davinci-003. That sounds reasonable until you count how many production systems in a large enterprise touch a single model endpoint.
This is technically true and operationally false. The API call to Claude looks similar to the API call to GPT-4o. But the output characteristics of each model are different enough that a prompt engineered for one will not reliably produce usable output from the other without significant rework. Enterprise teams spend months calibrating system prompts, few-shot examples, temperature settings, and output schemas for a specific model's behavior. That calibration is not transferable. It is institutional knowledge embedded in production code that must be rebuilt, tested, and re-validated for every model switch. The switching cost is not the API key. The switching cost is the engineering time, the evaluation effort, and the regression testing required to establish that the new model meets the same quality bar across all production use cases. For a large enterprise with 20+ AI-enabled workflows, that cost runs into millions of dollars and multiple quarters of disruption.
AI use policies address what employees can and cannot do with AI tools. They do not address infrastructure dependency risk. The question of whether your organization has an acceptable use policy for AI is completely separate from the question of whether your organization would survive a 90-day forced migration off its primary foundation model provider. The first is a compliance question. The second is a business continuity question. Most enterprise AI risk frameworks in existence today were written to address the first. Almost none address the second. NIST's AI Risk Management Framework, published in January 2023 and updated in 2024, is explicit that supply chain risk, including model provider concentration, is a distinct risk category requiring its own governance controls. Most enterprises have not implemented those controls.
What the Precedents Tell Us
Concentration risk in critical infrastructure is not a hypothetical concern. The precedents are well-documented, and they follow a consistent pattern: the risk looks manageable until the event occurs, at which point the dependency becomes visible and the cost of the scramble far exceeds what mitigation would have required.
The TSMC precedent
Taiwan Semiconductor Manufacturing Company produces roughly 90% of the world's most advanced semiconductor chips. For decades, this concentration was understood but tolerated because TSMC was reliable, the geopolitical risk seemed remote, and building alternative capacity was expensive. The US CHIPS Act, signed in August 2022, committed $52.7 billion in federal subsidies to domestic semiconductor manufacturing precisely because Congress concluded that a forced dependency on a single geographic concentration of critical infrastructure was an unacceptable national risk. The enterprise AI equivalent is not a nation-state. It is an industry-wide dependency on three companies. The mitigation is not $52 billion in government subsidies. It is an architecture decision your engineering team can make this quarter.
The cloud concentration lesson
When enterprises moved workloads to cloud in the 2010s, most chose a single provider for simplicity. By 2019, most large enterprises were actively building multi-cloud strategies, not because their primary provider had failed, but because they recognized that single-provider dependency was an unacceptable operational and negotiating position. The analogy to foundation models is precise: the window to build the abstraction layer is now, before a forced migration makes the cost visible and the timeline non-negotiable.
The SolarWinds supply chain event
In December 2020, a compromised software update from SolarWinds propagated malicious code to approximately 18,000 organizations including US government agencies and Fortune 500 companies. The attack vector was trust in a third-party software supplier. Foundation model providers sit in a structurally similar position: they are trusted third parties whose outputs are consumed without independent verification by systems that take real business actions. A compromised or deliberately modified model update could propagate incorrect outputs through every downstream enterprise system before any organization detected the change. The MITRE ATLAS framework catalogs this as an AI supply chain attack vector, designation AML.T0010.
The Governance Actions That Actually Reduce Exposure
I want to be precise here. The goal is not to eliminate foundation model usage. The competitive advantage that comes from AI-enabled workflows is real and significant. The goal is to structure that usage so the organization retains operational control and the ability to respond to provider-side changes without a crisis response.
There are five concrete governance actions that materially reduce concentration risk. Each one can be initiated without waiting for a risk event to make the case.
-
Build and mandate a model abstraction layer. Every production AI application must connect to an internal API gateway that routes to the underlying provider, not directly to the provider's endpoint. This gateway abstracts the provider and model version from the consuming application. When a model is deprecated or a provider needs to be switched, only the gateway configuration changes. Applications are unaffected. This is the single highest-leverage engineering investment for reducing concentration risk. It is also the investment that is hardest to retrofit once dozens of applications are already calling provider APIs directly. The time to build it is before the event that makes it necessary.
-
Establish and enforce a provider concentration limit. Define a threshold - for example, no single provider may supply more than 60% of critical AI workload volume, where critical means any system whose failure would materially affect revenue, regulatory compliance, or customer safety. Measure this quarterly. Report it to the risk committee. If you do not measure it, it will drift toward 100% concentration as teams default to the most capable and familiar provider.
-
Maintain live secondary model integrations. For every critical AI workflow, maintain a tested, validated integration with a secondary model from a different provider. This integration must be exercised in production regularly - even at low volume - to ensure it stays current. A secondary integration that has not received traffic in six months will not be ready when you need it. The secondary does not need to match the primary's quality on every task. It needs to be good enough to keep the business running during a migration.
-
Run annual deprecation drills. Select one critical AI workflow each year and simulate a forced migration: take the primary model offline for a defined window and require the team to operate on the secondary. This exercise reveals the actual migration cost, identifies gaps in the secondary integration, and builds organizational muscle that reduces the response time when a real deprecation notice arrives. Teams that have rehearsed the migration complete it in weeks. Teams that have not complete it in quarters.
-
Require deprecation response plans for every production AI system. For each system in the model inventory, document: the primary model, the secondary model, the estimated engineering effort to migrate, the business continuity plan for the migration window, and the escalation path if the secondary is also unavailable. This document should be reviewed and updated whenever the underlying model changes. Its existence forces the conversation about dependency before a crisis makes that conversation urgent.
"The time to build the abstraction layer is before the event that makes it necessary."
The Question That Reveals Your Actual Exposure
There is a single question that cuts through board-level abstractions and reveals the actual state of your organization's concentration risk in under five minutes. It is this: If OpenAI announced today that all GPT-4 variants would be deprecated in 90 days, which of our production systems would fail, what would the business impact be per day of outage, and who owns the migration?
I have asked this question in board sessions and executive team meetings across a range of industries. The answers fall into three categories. A small number of organizations - roughly 10 to 15% in my experience - can answer it immediately, specifically, and with a documented plan. These organizations have already built the abstraction layer, the model inventory, and the deprecation playbooks. They are not immune to concentration risk, but they have converted an existential exposure into a manageable operational event.
The majority of organizations cannot answer it precisely. They know they use OpenAI. They do not know which systems depend on which model versions. They do not have a migration plan. They have not estimated the engineering cost. The teams who would execute the migration have not been identified. This organization, which describes the majority of enterprise AI deployments today, has a significant undisclosed liability on its balance sheet in the form of concentration risk that has not been quantified, disclosed, or managed.
A third category gives a confident answer that turns out to be wrong when pressed. "We use Azure, so we're fine." "Our vendor manages that." "We can switch providers in a few weeks." These answers reflect the four misconceptions I described above. They are the most dangerous category because they signal that the organization believes it has addressed the risk when it has not.
The good news is that the mitigation is not prohibitively expensive or technically complex. A model abstraction layer is a well-understood software pattern. Provider diversification is a strategy most large enterprises already apply to cloud infrastructure. Deprecation planning is standard practice for any critical dependency. The gap is not capability. The gap is that these practices have not been applied to AI infrastructure because AI infrastructure has not yet been recognized as critical infrastructure. That recognition is the first step. The governance mandate that follows from it, the model inventory, the abstraction layer, the concentration limits, and the deprecation drills, is the board's job to demand and the executive team's job to deliver.
Primary Sources
- NIST AI Risk Management Framework (AI RMF 1.0), National Institute of Standards and Technology, January 2023
- MITRE ATLAS: ML Supply Chain Compromise (AML.T0010), MITRE Corporation, 2024
- OpenAI Platform Documentation: Model Deprecations, OpenAI, 2024
- Semiconductor Industry Association, CHIPS Act Overview and Domestic Manufacturing Investment Rationale, 2022
- CISA Advisory AA20-352A: Advanced Persistent Threat Compromise of Government Agencies (SolarWinds), December 2020
- Bommasani et al., "On the Opportunities and Risks of Foundation Models," Stanford CRFM / arXiv, 2021
- Gartner, Cloud Strategy Research: Multi-Cloud Adoption Patterns in Enterprise, 2023
- Andreessen Horowitz, State of AI 2025: Enterprise Model Usage and Provider Concentration, 2025