If you spend long enough inside capital programs, you start collecting failure stories. The petrochemical project that was sanctioned at $1.2B and finished at $1.9B. The LNG facility that started up two years late. The transmission build-out that quietly got de-scoped to half its original capacity to hit budget. After a while the stories start sounding the same, and the obvious question becomes: how much of this is bad luck, and how much of it is something repeatable that the industry could just stop doing?
The honest answer, after twenty years of empirical work by McKinsey, IPA, Bent Flyvbjerg's group at Oxford, and several others, is that a stunning amount of it is repeatable. Megaprojects fail in patterned, predictable ways — and the patterns have been documented for long enough that there is no longer much excuse for sanctioning a billion-dollar project without modeling them explicitly. This essay walks through what the data actually says about megaproject failure, where the popular explanations are wrong, and what a capital committee can do about it on Monday morning.
The headline numbers
Across the major datasets, the failure rates for capital projects above $1B converge on a tight band. Roughly nine out of ten run over budget. Roughly eight out of ten run over schedule. Cost overruns average 30-50% of the sanctioned estimate, with the right tail dominated by a handful of projects that come in at multiples of the original number. Schedule overruns average 30-40% of the sanctioned duration, and the same right-tail dynamic holds.
What's striking is how stable these numbers are across industries — oil and gas, transportation, hydropower, nuclear, IT — and across decades. Flyvbjerg's reference class, which now includes thousands of projects spanning sixty years, shows essentially no improvement in megaproject performance over time. Whatever the industry has been doing since the 1960s to manage capital better has not moved the aggregate.
The instinctive response to a number like "90% of megaprojects run over budget" is to assume the estimates were poorly done and that better engineering at FEED would have closed the gap. The data does not support that. Estimating accuracy has improved enormously over sixty years; the overrun rates have not. Something else is going on.
Three popular explanations that don't survive the data
When a project blows up, the post-mortem narrative tends to converge on one of three explanations: "the contractor underperformed," "the scope changed," or "we got unlucky on a major risk event." All three are real. None of them, on their own or in combination, explains the aggregate.
"The contractor underperformed"
Contractor performance varies, and a bad contractor on a project can cost meaningful money. But contractor variance is high-mean, low-spread relative to the overall outcome distribution. If you re-ran a 90%-overrun project with a top-quartile contractor, you would typically still see a project that overran — by 50% instead of 90%, perhaps, but still squarely in the failure column. Contractor selection is a lever; it is not the lever.
"The scope changed"
Scope changes are real, and a project that grew 40% in scope post-sanction is going to cost more than the original number. But the data shows that even projects that delivered to the original scope ran systematically over the sanctioned cost. Stripping out scope-change projects from the dataset reduces overruns by maybe 10-15 percentage points. The remaining overrun — call it 30-40% on average — is the part nobody can blame on scope creep.
"We got unlucky"
This is the most slippery of the three because it has the texture of statistical sophistication. Big projects do encounter genuine bad luck — a regulatory change, a supplier bankruptcy, a hurricane. But "unlucky" is a one-sided story. If projects were genuinely subject to symmetric uncertainty, half of them would come in under budget. They don't. The skew is the tell: the same risks that occasionally produce a 200% overrun do not, in any meaningful frequency, produce a 50% underrun. Whatever the underlying generating process is, it is not symmetric noise around an unbiased estimate.
What actually explains the aggregate
The empirical literature points to two structural drivers that, together, explain most of the cross-project variance.
1. Optimism bias in the sanction case
This is Flyvbjerg's central finding, and twenty years of additional data have strengthened rather than weakened it. The cost and schedule estimates that go into sanction packages are systematically optimistic relative to what comparable projects have actually delivered. The bias is not random — it has a sign and a magnitude that is consistent across industries and decades. For megaprojects, the median sanctioned cost estimate underestimates the eventual delivered cost by something like 25-30%. The schedule estimate underestimates eventual duration by something like 20-25%.
The mechanism is partly cognitive (the planning fallacy — humans systematically underestimate the time and cost of complex tasks) and partly institutional (the project champion needs the number to look good to get the project sanctioned, and the sanction committee has weak incentives to push back on a champion's number that looks plausible). Whatever the mechanism, the consequence is that the sanctioned cost and schedule are not unbiased estimates of the eventual outcome. They are systematically biased low.
2. The deterministic fallacy
The second structural driver is that the sanctioned plan is almost always presented as a single deterministic number — "$1.2B and 36 months" — rather than as a distribution. A deterministic number is not a forecast. It is, at best, a P50 with a wide and unstated band. When the project comes in at $1.6B, the committee is surprised; but the committee should not have been surprised, because $1.6B was always inside the plausible distribution. The committee was surprised because it had been shown a number, not a distribution.
This matters because contingency, risk reserves, and stage-gate decisions all key off the sanctioned number. If the sanctioned number is a P50, the appropriate contingency might be 30%. If the sanctioned number is presented as a P50 but treated as a P90 (which is how it gets used in practice — "we are committing to $1.2B"), the appropriate contingency would be much higher, and the committee is implicitly under-provisioning the project from day one.
What a capital committee can do
The hopeful version of the failure data is that the dominant drivers are operationally addressable. Both optimism bias and the deterministic fallacy can be corrected with practices that are well documented in the literature and require no new technology to implement.
Reference-class forecasting
The single highest-leverage practice from the empirical literature is reference-class forecasting: instead of estimating a project bottom-up from its activity list (which is where optimism bias enters), estimate it top-down by comparing it to a reference class of similar completed projects and using their actual cost and schedule outcomes as priors. Done properly, this typically produces sanction numbers 20-30% above what a bottom-up estimate would produce — and matches the empirical outcome distribution far better. UK government projects above £1B are now formally required to apply reference-class adjustments, and the early evidence is that delivered overruns have started to come down. Probabilistic project planning is the operationalization of this practice.
Sanction on a distribution, not a number
The companion practice is to present the sanction case as a distribution rather than a point estimate. P10/P50/P80/P95 cost numbers, with the variance drivers behind each, force the committee to think about what it is actually committing to. "We are sanctioning at the P80 of $1.45B with a $200M contingency" is a different conversation than "we are sanctioning at $1.2B." The first acknowledges the empirical shape of capital project outcomes; the second is the shape of every capital project that subsequently overran. Monte Carlo project simulation is the standard tool for producing the distribution.
Stage-gate decisions on the live forecast
The third practice is to keep the probabilistic forecast live throughout execution, not just at sanction. As actuals come in, the distribution updates, and the committee can re-evaluate at each stage gate. The classic failure mode is that the deterministic plan ages and decays through execution, and by the time the committee notices the project is in trouble, the rescue options have closed off. A live distribution surfaces the divergence months earlier and preserves option value.
The uncomfortable corollary
The uncomfortable corollary of the failure data is that most of what goes wrong on capital projects was knowable at sanction. Not in the sense that any specific project was destined to fail — individual project outcomes are genuinely uncertain — but in the sense that the aggregate was knowable, and any specific sanctioned plan was already deep inside a known distribution of failure modes. The empirical literature has been telling capital committees this for twenty years. The implementation gap is what Capital Project AI exists to close.
If you are running a capital program and the sanctioned plan is a single number rather than a distribution, you are operating below the practice frontier. The math to fix it is not new. The institutional discipline to act on it is what's hard.
See the failure-rate math applied to your portfolio
The Capital Project AI platform applies reference-class forecasting and probabilistic outcome distributions to your sanctioned plans — surfacing optimism bias before the committee signs off.
Open the Dashboard →