How to predict cost overruns before they happen

Reference-class forecasting and Monte Carlo simulation have been in the project-management toolbox for forty years. Every textbook covers them. Every certification curriculum tests them. Every major capital owner has somebody on staff who can run them in a spreadsheet. And yet sanctioned capital projects still routinely come in at 1.4× their P50 estimate, with the right tail of the outcome distribution producing the 200%+ overruns that make the trade press.

The gap between "the math exists" and "the math gets used to drive the sanction decision" is the entire point of this essay. The methods are straightforward. The institutional discipline to apply them honestly — and to write the output back into the sanction case in a form the committee actually engages with — is what's missing. This piece walks through what predicting cost overruns actually looks like when done right, the four places it most often goes wrong in practice, and how to write the result back into the meeting where the decision is made.

The two methods, briefly

Reference-class forecasting

Reference-class forecasting (RCF) starts from the empirical fact that capital projects are not snowflakes. A 700 MW combined-cycle gas plant, an LNG train, an ethylene cracker, a 50-mile pipeline — each of these has a reference class of comparable completed projects whose actual cost and schedule outcomes are knowable. RCF says: estimate your project not by summing up its activities (which is where bias enters) but by comparing it to the reference class and using the empirical distribution of outcomes from that reference class as your prior.

If the reference class shows a mean cost overrun of 32% and a P80 cost overrun of 55%, your starting point for sanction should be the bottom-up estimate adjusted up by something like that magnitude. If your bottom-up number says $1.0B and the reference-class adjustment says you should be at $1.32B for the P50, you have a defensible reason to push the sanction case to $1.32B with a contingency band that gets you to the P80 of $1.55B.

Monte Carlo simulation

Monte Carlo simulation (MCS) is the complementary technique. RCF gives you the location of the outcome distribution; MCS gives you the shape. Each cost line item and each schedule activity gets a probability distribution rather than a point estimate, the model runs the project thousands of times sampling from those distributions, and the output is the full empirical distribution of project-level cost and schedule outcomes — P10, P50, P80, P95.

Done well, MCS surfaces three things a deterministic plan cannot: the variance, the variance drivers, and the correlations. Variance tells the committee how wide the outcome band is. Variance drivers identify which 3-5 inputs are doing most of the work. Correlations reveal that "independent" risks are not independent — labor productivity and weather, for example, are correlated through season — and the project's true variance is wider than naïve summation would suggest.

Four places this goes wrong in practice

The methods are well documented and well taught. The reason sanctioned plans continue to overrun is that the application of the methods, in practice, breaks down at four predictable places.

1. Inputs are guessed, not calibrated

The honest version of MCS requires that each input distribution — say, the cost of structural steel, or the duration of a major piping spool fabrication — be calibrated against historical data on similar work, not just guessed by the estimator. In practice, the input distributions are often "the estimator's best guess plus or minus 20%." This produces output distributions that are far too narrow, because the input distributions are too narrow. The MCS run technically converges, but on a distribution that doesn't reflect the empirical scatter of real projects.

The fix is calibration: every input distribution should be benchmarked against the firm's historical actuals (or, where the firm doesn't have enough history, against industry reference data). When this is done, input distributions typically widen by 50-100%, and the project-level P80 moves up by 15-25%. This is not pessimism — it is the empirical truth that the firm has been ignoring.

2. Correlations are ignored

The default MCS setup treats every input as independent. This is mathematically convenient and almost always wrong. Steel and concrete are correlated through commodity cycles. Labor productivity on adjacent activities is correlated through site supervision quality. Weather delays cascade through downstream activities. Treating these as independent under-counts variance materially — typical understatement is 20-40% on the project-level standard deviation.

The fix is to model correlations explicitly. A Cholesky-decomposition or copula-based MCS handles this; the implementation is not hard once the correlation matrix is specified. Specifying the correlation matrix is the work, and it is work the project team has to do honestly. Monte Carlo project simulation done right always includes the correlation step.

3. The optimism bias correction gets dropped

RCF says: take the bottom-up estimate, multiply by the reference-class adjustment factor, and use the result as the sanction case. In practice, the bottom-up estimate is what the project team has spent six months building, and there is enormous institutional resistance to "just multiplying it by 1.3 because Flyvbjerg said so." The adjustment gets quietly omitted, or applied at half-strength, or replaced by the project team's "expert judgment" that this project is different.

It is almost never different. The reference class exists for a reason. The discipline to apply the adjustment in full — and to do it as a formal step in the sanction process, not as an optional sensitivity — is the single highest-leverage practice in this whole essay. Every percentage point that the optimism-bias adjustment is haircut moves the sanction case toward the failure-mode distribution.

4. The output is not written back into the sanction case

The fourth and most common failure mode is that the MCS run gets done, the chart of P10/P50/P80 outcomes gets produced, and then the deterministic plan goes into the sanction package anyway. The committee sees "$1.2B and 36 months" in the executive summary, and the probabilistic chart is in an appendix nobody reads.

The fix is structural: the sanction case has to be presented as the distribution. Not as a footnote, not as an appendix — as the headline number. "We are recommending sanction at the P80 cost of $1.45B with a delivered-completion P80 of 41 months. The P50 is $1.32B and 37 months. The variance drivers are X, Y, Z." That sentence is what the committee should be voting on. If the sanction package leads with a single number, the work that produced the distribution has been wasted.

The variance-decomposition step

One element of MCS that often gets under-used is variance decomposition. After the simulation runs, you can compute how much of the project-level variance is explained by each input variable. The result is almost always that 3-5 inputs explain 60-80% of the total variance. This is enormously useful — it tells the project team where to spend its risk-mitigation effort.

If steel cost variance accounts for 30% of project cost variance, hedging steel becomes a high-leverage move. If labor productivity accounts for 25%, investing in tier-1 supervision becomes a high-leverage move. If permitting timeline accounts for 20%, accelerating regulatory engagement becomes high-leverage. Without the decomposition, the project team tends to spread mitigation effort evenly across the risk register — which is enormously inefficient because most of the register's items contribute negligible variance.

The corollary is that the long-tail items in the risk register are often best left alone. They are real risks, but their contribution to project-level variance is small enough that the cost of buying them down exceeds the variance reduction. Putting management attention on them at the expense of the high-leverage 3-5 is a common mis-allocation.

Live forecasting through execution

Predicting cost overruns is not a one-time exercise at sanction. The sanctioned distribution should be a live forecast that updates as actuals come in through execution. Every monthly project review should ask: given the actuals to date, has the live distribution moved? And if so, in what direction, and why?

The classic failure mode is that the deterministic plan ages and decays through execution. Six months in, the schedule is slipping and the cost is creeping, but each individual monthly variance looks small enough to absorb, and the deterministic plan stays on the page. By the time the variance is undeniably material, the rescue options have closed off — long-lead items have been ordered, the contractor is locked in, scope changes are prohibitively expensive.

A live probabilistic forecast surfaces the divergence months earlier. If the sanctioned P80 was $1.45B and the live forecast P80 is now $1.62B by month 8, the committee has a real conversation to have at month 8 rather than at month 18. The conversation might be "we accept the new P80, here is the additional contingency authorization" or "we are no longer sanctioning this project, we are restructuring." Either way, the option value of acting early is preserved. Capital project management software built around live distributions, rather than around deterministic variance reports, is what makes this practical.

What to actually do on Monday

If you run a capital program and you want to start predicting overruns honestly, the action items are concrete and short:

Build a reference class. Pull the firm's last 20-30 completed projects of comparable type. Compute the actual-vs-sanctioned cost ratio for each. The mean and the P80 of that ratio is your reference-class adjustment factor. You can do this in a week.
Calibrate input distributions against actuals. Take the firm's last 50 cost line items and compute the actual-vs-estimate ratio. The empirical distribution of that ratio is what your input distributions should look like. It is almost always wider than what your estimators are using.
Specify a correlation matrix. List the top 10-15 cost and schedule inputs. For each pair, ask: is there a structural reason these would move together? Even a coarse 0.0 / 0.3 / 0.6 / 0.9 categorization is enormously better than the implicit 0.0 of the standard MCS setup.
Lead the sanction package with the distribution. Replace the headline single number with the P50/P80 pair plus the variance drivers. If the committee pushes back, that pushback is a feature — it forces the conversation to be about what the firm is actually committing to.
Update the distribution monthly. Make the live P80 a standing item in the monthly project review. The first time the live P80 diverges from the sanctioned P80 by more than 10%, escalate to the committee.

None of this is new math. It has been in the literature for forty years. What's new is the tooling that makes it operationally cheap to do at the cadence the committee actually needs.

Run reference-class + Monte Carlo on your portfolio

The Capital Project AI platform implements reference-class forecasting, calibrated input distributions, correlation modeling, and live updates against execution actuals — in a workflow designed for the capital committee, not the spreadsheet analyst.

Open the Dashboard →