SOMA

Guide

P50, P80, P95 in Cost Estimation: Which Confidence Level Should You Actually Use?

P50 is the IPA-required central estimate for UK capital cost. P80 is the UK departmental sensitivity convention. P95 is for safety-critical programmes and portfolio-level safeguards. How to pick the right confidence level for project sanction — and what HM Treasury Green Book and IPA Cost Estimating Guidance actually say, versus the working conventions departments use in practice.

Adam O'NeillLast updated 9 min readPart of Quantitative risk analysis (QRA)

P50, P80, P90 and P95 — what UK guidance actually says and where each is used

Confidence levelProbability of not exceedingWhere it is referenced in UK guidanceTypical use in practice
P5050%IPA Cost Estimating Guidance: required as the "Median Scenario / P50 equivalent" central estimate at every stage gateIPA-mandated central estimate; NEC4 Option C/D target cost (per AACE / NEC users' group commentary); internal management reporting and pain-gain incentive targets
P8080%Not specified by HM Treasury Green Book or IPA as a written requirement; the de facto departmental convention for upper-bound sensitivity (Homes England guidance treats P50/P80 as standard sensitivities)Departmental capital approvals where finance teams have set P80 as their risk-appetite point; sensitivity test alongside the IPA P50 central estimate
P9090%Used as the illustrative example in HM Treasury Green Book 2022 (para 6.72) — "practitioners may express the uncertainty around a central estimate using a P90 value"Worked example in Green Book guidance; some departmental contexts where finance prefers a tighter upper bound than P80
P9595%Not specified by Green Book or IPA; sometimes adopted by safety-critical and high-consequence programmes (ONR-regulated, nuclear decommissioning) as portfolio-level safeguardNuclear decommissioning; safety-critical infrastructure; portfolio-level safeguard envelopes where downside consequences are catastrophic

What P50, P80 and P95 actually mean

P50, P80 and P95 are percentile points on a cost or schedule probability distribution. A P50 cost means the project has a 50% probability of coming in at or below that figure. P80 means 80% probability, and P95 means 95%. The distribution itself comes from Monte Carlo simulation of the QRA model — thousands of iterations across the base estimate and identified risks, producing a full range of possible outcomes.

The distinction matters because the gap between these percentiles is not symmetric. On most project cost distributions, the shape is right-skewed — there is more upside exposure than downside — so the move from P50 to P80 is typically smaller than the move from P80 to P95. A project with a P50 of £100m might have a P80 of £118m and a P95 of £135m. That asymmetry is what makes the choice of confidence level a commercial decision rather than a statistical one.

The single most common misunderstanding is to treat P80 as "the answer". A P80 is the cost figure the project has an 80% probability of not exceeding — which means there is a 20% probability it will. On ten similar projects funded at P80, statistically speaking two will overrun. Whether that is the right risk appetite depends entirely on who is paying and what the consequences of an overrun would be.

Why P80 became the UK default (and what the Green Book actually says)

P80 has become the de facto convention used by many UK departments for sensitivity testing and contingency-setting on major capital programmes — but it is worth being precise about where this comes from, because the published guidance is more nuanced than the convention suggests. The HM Treasury Green Book (2022, paragraphs 6.72–6.84) requires explicit adjustment for risk and optimism bias, and recommends advanced techniques such as Monte Carlo analysis for high-cost, high-risk proposals. It uses P90 as a worked example of how to express uncertainty around a central estimate ("practitioners may express the uncertainty around a central estimate using a P90 value") but it does not mandate any specific confidence level.

The IPA Cost Estimating Guidance is more specific in a way that surprises many people. It requires the central estimate to be presented at the Median Scenario / P50 equivalent — "the estimator believes that there are comparable probabilities of the actual outcome to be higher or lower than this threshold." The IPA Cost Estimating Requirements then express stage-gate confidence as percentage bands around that Anticipated Final Cost rather than as a single P-level: SOC target −20% / +50%, OBC target −15% / +30%, FBC target −10% / +10%. P80 is not the IPA's written requirement — it is a working benchmark many departments use for upper-bound sensitivity.

The reasoning behind the P80 convention is pragmatic rather than theoretical. P50 alone is uncomfortable as a single funding figure because it means accepting a 50% probability of overrun. P95 is uncomfortable because the gap between P80 and P95 is typically so large on a right-skewed cost distribution that funding at P95 immobilises contingency that could have been deployed on other programmes. P80 sits at a point where the marginal contingency buys meaningfully less risk reduction than the preceding steps did, and Homes England's published guidance (one of the few UK departments to set this down explicitly) treats P50 and P80 as standard sensitivity tests unless alternatives are more appropriate. AACE International Recommended Practice 57R-09 treats P80 as one common reference point for integrated cost-schedule risk analysis outputs but does not specify it as a standard.

The practical position for a UK public-sector business case is therefore: present the P50 as the central estimate (IPA-required); present P80 as an upper-bound sensitivity (departmental convention); show the confidence range against the IPA stage-gate bands; and document why the funded position is where it is. "P80 because the Green Book says so" is not a defensible justification — the Green Book does not say so. "P80 because departmental finance has set that as our risk-appetite point and the IPA cost-band tolerance accommodates it" is defensible.

When P50 is the right answer

P50 has specific legitimate uses that are often overlooked. The most common is internal forecasting. The P50 is the expected outcome — the point around which actual performance will distribute — and it is the appropriate benchmark for management-level performance reporting. Comparing actual cost against P50 (rather than against P80) tells the project team whether the programme is running better or worse than the central expectation. Comparing actual against P80 will, by definition, show underperformance more than half the time, which is not actionable information for delivery management.

P50 is also appropriate for incentive structures. Pain-gain mechanisms, target cost contracts and performance-based contingency drawdown work best when the contractor is rewarded for beating the P50 and penalised for exceeding it. Setting these mechanisms against P80 creates a pain-gain that is nominally generous to the contractor but rewards underperformance, because beating P80 is not a meaningful achievement on a project that should deliver close to P50.

The third case is where the funding decision explicitly accepts risk. Innovation programmes, technology demonstrators and early-stage R&D are sometimes funded at P50 by sponsors who have deliberately chosen to accept higher probability of overrun in exchange for lower upfront capital commitment. This is a valid choice as long as it is made explicitly. The failure mode is funding at P50 while claiming P80 confidence — which is rarer than it was but still occurs on programmes where the QRA has not been scrutinised properly.

When P95 is the right answer

P95 is the right answer when the consequences of overrun are catastrophic in a way that justifies carrying substantial additional contingency. The clearest cases are safety-critical programmes and programmes where a funding shortfall would force termination of work that cannot easily be resumed. Nuclear new build programmes, some defence acquisition programmes, and certain regulated infrastructure contexts all have features that can justify P95 funding.

P95 is also sometimes appropriate for the overall funding envelope of a large programme, even when individual projects within it are funded at P80. The portfolio-level P95 is not simply the sum of individual project P95s — correlation effects reduce the portfolio contingency relative to the sum of project contingencies, and the portfolio view allows for some cross-project risk sharing. Public-sector spending authorities sometimes require portfolio P95 reporting as a safeguard against simultaneous material overruns across multiple projects.

The commercial cost of P95 is real and must be understood. Moving from P80 to P95 on a £500m programme might require an additional £40-60m of committed capital that cannot be deployed elsewhere. On a portfolio of programmes, this compounds significantly. A sponsor moving from P80 to P95 on principle, without a specific reason tied to the programme context, is choosing to immobilise substantial public or shareholder capital for a marginal reduction in overrun probability. That is a legitimate choice but it should be made deliberately.

The contingency calculation and who owns the drawdown

Once the confidence level is chosen, contingency is the difference between the P-point and the base estimate. Contingency at P80 is the P80 figure minus the deterministic base estimate. This is the amount of capital that needs to be held against the identified risk profile to achieve the chosen confidence level. It is not an arbitrary percentage — it is the output of the QRA model, and it has a defensible analytical basis.

The more important question is who owns the contingency and what the drawdown rules are. Contingency held at the project level, released against project-specific risks with project-level approval, behaves very differently from contingency held at the portfolio level, released against any project's emerging risks with central approval. The former makes project teams self-managing; the latter creates a pool that is efficient but politically contentious.

The worst arrangement is contingency held at the project level with drawdown rules that are effectively unrestricted in practice. If every request to release contingency is approved, there is no meaningful governance on the P80 position — the project is effectively funded at whatever its worst outcome turns out to be, and the P80 figure is a ceiling rather than a target. The opposite failure mode — contingency that is technically available but practically inaccessible because every drawdown request triggers a month of governance — means the project team treats it as unavailable and manages to the base estimate under informal pressure. Both fail to use the QRA output as intended.

What a good confidence-level conversation looks like

A well-run investment committee session on confidence level starts with the sponsor articulating their risk appetite in words: what is the tolerable probability of overrun, and what is the consequence of overrun if it occurs? The committee then looks at the QRA output and chooses a confidence level that is consistent with that articulated risk appetite — which might be P80 in most cases, but could be P50, P85, P90 or P95 depending on the context.

A poorly-run session starts with the default (usually P80), looks at the contingency requirement, and then negotiates the contingency down until the total funding envelope fits a pre-existing budget constraint. The P80 figure then becomes nominal — the funded position is actually P60 or P65, but the paperwork says P80. This is one of the most common patterns on UK public-sector programmes and it produces predictable overruns two to four years later.

The practical test of a good confidence-level decision is whether it would be defensible to the sponsor's board, the National Audit Office, or an IPA gateway reviewer. If the reasoning can be written down in a paragraph — "we chose P80 because our tolerance for overrun on this programme is low given its strategic visibility, and the £25m incremental contingency over P50 is acceptable given the £300m total scope" — the decision is defensible. If the reasoning would look uncomfortable on paper, the decision is probably not one that will hold up when the programme hits its first difficult quarter.

Red flags when someone tells you "it's P80"

Claims that a programme is funded at P80 should be tested. The first question is: P80 of what QRA model, with what risk register, run when and by whom? A P80 produced from a QRA that has not been peer-reviewed is a number, not a confidence level. An assurance reviewer should be able to see the model, the risk inputs, and the correlation structure, and reach their own view on whether the claimed P80 is defensible.

The second question is whether the P80 was produced before or after the funding decision was settled. QRA models that produce the "right" P80 to match a predetermined funding envelope exist and are not hard to construct — a weak correlation structure here, an under-stated three-point estimate there, a couple of material risks omitted from the register, and the model produces whatever number the sponsor wanted to see. The discipline of external independent QRA is what protects against this, and it is one of the principal reasons why UK public-sector frameworks require independent assurance on major capital programmes.

The third question is whether the P80 is being treated as a live figure or a historical artefact. A P80 produced at sanction and never re-run is reliable only while the underlying conditions that produced it remain valid. After a material risk materialises, a scope change is absorbed, or the programme moves into a new phase, the P80 needs to be re-calculated — and the re-calculated figure may be higher than the original. Programmes that freeze the P80 at sanction and report against a stale figure are not doing live QRA; they are using a one-off number as a performance target, which is a different and weaker thing.

Putting it together

Choosing a confidence level is a commercial decision dressed up as a statistical one. The statistics tell you what the numbers mean; the decision is about what level of overrun probability the sponsor can tolerate, given the commercial, reputational and operational consequences if it occurs. P80 is a reasonable default for UK public-sector major projects but is not the answer for every case. P50 has legitimate uses in management reporting and incentive design. P95 is appropriate where the downside is catastrophic or where portfolio-level safeguards require it.

The most common failure is not choosing the wrong confidence level — it is choosing one and then not actually funding the programme at it. P80 on paper and P60 in practice is a worse position than being honest that the programme is funded at P60, because it leaves the sponsor with an unrealistic expectation of overrun probability and no plan for what happens when the base estimate turns out to have been optimistic.

SOMA delivers QRA and confidence-level advisory to UK infrastructure, defence, nuclear and public-sector programmes — including on CADMID Concept and Assessment-phase business cases where the confidence figures presented to the IAB at Initial Gate and Main Gate, and the corresponding scrutiny from IPA assurance reviews and (on larger programmes) the Cabinet Office Major Projects Review Group (MPRG), need to be substantiable. The Single Source Regulations Office sits in parallel on the pricing side of single-source defence contracts under the Defence Reform Act 2014; QRA confidence figures sit with IPA/IAB rather than SSRO. We structure the model, the risk register and the contingency discussion so that the confidence level being claimed is defensible — at Gateway, at Board, at NAO if it ever comes to that. That is the standard the work should meet, and it is the standard we deliver to.

Strengthening your QRA function?

SOMA delivers quantitative risk analysis to AACE recommended practice — workshop facilitation, three-point calibration, Monte Carlo modelling and reports that survive gateway scrutiny. Independent, tool-agnostic, and written up so a board can act on the number.