Guide

Three-Point Estimate Calibration — What Low, Most-Likely, High Should Actually Mean

The difference between a QRA that works and a QRA that produces theatre usually comes down to how the three-point estimates were elicited. A practical guide to doing it properly — and to the PERT formula (O + 4M + P) / 6 and the (O + 3M + P) / 5 variant.

Adam O'Neill24 April 2026Last updated 22 May 202610 min readPart of Quantitative risk analysis (QRA)

What is PERT formula?

The PERT formula calculates the weighted-mean expected value from a three-point estimate. Standard PERT (used in the Beta-PERT distribution): Expected Value = (Optimistic + 4 × Most Likely + Pessimistic) / 6. A modified variant with lighter weighting on the most-likely value, often seen in PMI study materials, is (Optimistic + 3 × Most Likely + Pessimistic) / 5. Both are weighted averages of a three-point estimate; in live QRA, what matters more than the formula choice is whether the three-point estimate is calibrated against evidence.

Why three-point estimates are the foundation of QRA

A Monte Carlo QRA is only as good as the three-point estimates feeding it. The model itself — correlation handling, simulation logic, the S-curve output — is relatively well-understood by practitioners and fairly consistent across tools. What varies dramatically is the quality of the input distributions. A QRA built on poorly-calibrated three-point estimates will produce a falsely narrow output band that looks rigorous but misleads the sponsor; a QRA built on well-calibrated estimates will produce a defensible confidence position that holds up under scrutiny.

A three-point estimate for a single activity or cost line item consists of three values: low (optimistic), most-likely, and high (pessimistic). These three points define a probability distribution — typically triangular, PERT or BetaPERT — that the Monte Carlo simulation samples from across thousands of iterations. The shape and position of each distribution determines how the uncertainty in that one activity or cost line contributes to the overall project distribution.

The most common failure in QRA practice is that three-point estimates are elicited in a way that systematically understates uncertainty. Workshop participants anchor on the base estimate, then offer low and high values that are barely distinguishable from it — producing narrow triangular distributions that, aggregated across a project, generate a Monte Carlo output that is almost identical to the deterministic base estimate. The model is technically run; the outputs are technically produced; and the QRA is technically delivered. It is also providing no analytical value.

The PERT formula — (O + 4M + P) / 6 and the (O + 3M + P) / 5 variant

The PERT formula gives a single weighted-mean point estimate from a three-point estimate. The classic version, derived from the Program Evaluation and Review Technique developed for the US Navy Polaris missile programme in 1958, is Expected Value = (Optimistic + 4 × Most Likely + Pessimistic) / 6. The four-times weighting on the most-likely value reflects an underlying assumption that the most-likely outcome is roughly four times more probable than either extreme — which is what produces the bell-like Beta-PERT distribution shape sampled by Monte Carlo tools. Standard deviation under classic PERT is approximated as (Pessimistic − Optimistic) / 6.

A less common variant uses (Optimistic + 3 × Most Likely + Pessimistic) / 5 as the weighted mean. This is sometimes called the "modified PERT" or "weighted three-point" formula and turns up in some PMI study materials, practitioner texts, and exam-prep questions. The three-times weighting produces a slightly flatter distribution and a mean closer to the simple average of the three points. The variant exists because the original PERT weights were calibrated for activity-duration estimates in a specific 1950s programme context; some practitioners argue that on modern projects with wider real-world ranges, the four-times weighting concentrates probability mass too tightly around the most-likely value to reflect real uncertainty.

A worked example makes the difference concrete. For a three-point estimate of Optimistic = 8 days, Most Likely = 12 days, Pessimistic = 22 days: standard PERT (O + 4M + P) / 6 gives an expected value of (8 + 48 + 22) / 6 = 13.0 days; the (O + 3M + P) / 5 variant gives (8 + 36 + 22) / 5 = 13.2 days; the simple triangular mean (O + M + P) / 3 gives (8 + 12 + 22) / 3 = 14.0 days. The three formulas converge when the estimate is symmetric and diverge most where the distribution is skewed — exactly where the choice of weighting matters most.

In practice, the formula choice rarely matters for live QRA — the Monte Carlo simulation samples from the full distribution, not the weighted-mean point estimate. Where the formulas are used is in qualitative work: quick sanity-checks of a three-point estimate before running the simulation, expected-value calculations on a deterministic schedule, contingency uplift on a small cost line that does not justify full Monte Carlo treatment, or PMI-style exam questions. For QRA practitioners, the more important question is whether the three-point estimate range itself is calibrated against evidence — which is what the rest of this guide is about. A correctly calibrated three-point estimate fed into a triangular, classic PERT or Beta-PERT distribution will produce similar Monte Carlo outputs; an incorrectly calibrated estimate fed into any of them will produce nonsense.

The anchoring trap and how to break it

When a workshop participant is asked "what is the optimistic duration for this activity?", the cognitive process almost always starts from the base estimate and subtracts a little. The same happens on the pessimistic side — starting from the base estimate and adding a little. The result is a symmetric, narrow triangular distribution centred on the base estimate, which is not a reflection of actual uncertainty; it is a reflection of anchoring bias.

Breaking the anchor requires different questioning. Instead of "what is the optimistic value?", ask "under what specific conditions would this activity finish early, and how early?". Instead of "what is the pessimistic value?", ask "what would have to go wrong for this activity to significantly exceed the plan, and how far would it push?". The shift from numerical to conditional elicitation forces participants to engage with scenarios rather than deltas from the anchor, and typically produces wider, more honest distributions.

Another technique is to collect the low and high values before revealing the base estimate. On a new activity, participants are asked for their low and high independently, and only then is the plan figure shared for comparison. This works well in early-stage QRA workshops where the base estimate is not yet fixed. On live programmes where the base is already agreed, the discipline of asking for conditions rather than values still produces materially different results from the naive approach.

Calibrating against evidence, not intuition

The strongest three-point estimates are calibrated against evidence — historic project data, benchmark performance, analogous activities on comparable programmes. If the activity being estimated is "install 200m of piling at Section B", and historic data from the same client on similar ground conditions shows piling rates varying between 8m/day and 25m/day with a mean of 16m/day, the three-point estimate should reflect that range rather than whatever the workshop participants happen to feel comfortable claiming.

AACE International's Professional Guidance Document PGD-02, the Guide to Quantitative Risk Analysis, and Recommended Practice 57R-09 between them set out the methodology for evidence-based calibration in QRA workshops. The practical implication is that the workshop facilitator needs to come prepared with benchmark data where possible, and has to push back when participant estimates are materially inconsistent with that data. A participant who claims a 6-week pessimistic duration when every comparable programme has run for 12-16 weeks needs to be asked what specific differences justify the narrower range.

Where historic data is not available — on innovative scope, first-of-a-kind work, or early-stage estimates — explicit use of expert judgement is unavoidable. The key is to be transparent about the basis of the estimate. "Based on expert judgement, low = 30 days, most likely = 45 days, high = 80 days, reflecting the significant uncertainty about equipment availability and the thin analogous data for this specific scope" is a defensible estimate. A bare set of numbers without explanation is not.

The shape of the distribution matters more than you think

Three-point estimates can be modelled with different distribution shapes, and the shape has a material effect on the Monte Carlo output. The three most common are: triangular (straight lines from low to most-likely to high), PERT (a smooth Beta-like shape weighted toward the most-likely value), and BetaPERT (a flexible variant that allows the relative weighting of the peak to be adjusted).

Triangular is the simplest and most common choice, and it is a reasonable default. Its limitation is that it treats the low and high values as absolute bounds — the Monte Carlo simulation will never sample outside that range. In practice, very few real-world activities have hard bounds; a piling rate of 2m/day is improbable but not impossible, and a triangular distribution with low = 5m/day will reject it entirely. PERT and BetaPERT distributions have asymptotic tails that better reflect the real shape of uncertainty, at the cost of slightly more complex parameterisation.

For most UK infrastructure QRA work, triangular distributions are sufficient provided the low and high values are chosen to represent genuine 5th and 95th percentile points rather than absolute bounds. AACE RP 57R-09 discusses distribution shape selection in detail and is the reference standard for QRA practitioners. The practical rule is that if the distribution shape is materially affecting the P80 output, the three-point values are probably not well-calibrated in the first place — a well-calibrated triangular will produce a similar P80 to a well-calibrated PERT on most real activities.

Separating variability from event risk

A recurring source of confusion in three-point estimate workshops is mixing variability risk with event risk. Variability is the inherent uncertainty in how an activity that definitely will happen will unfold — the piling activity will definitely take place; the question is whether it takes 40, 50 or 70 days. Event risk is a discrete occurrence that may or may not happen — the risk that unexpected ground conditions require additional pile length, which has a 30% probability and a 5-10 day impact if it materialises.

These two kinds of uncertainty should be modelled separately. Variability goes into the three-point estimate on the activity itself. Event risk goes into the risk register as a probability-weighted impact on the relevant activities. Mixing them — bundling ground conditions risk into a wider pessimistic value on the piling duration — produces a QRA that cannot identify specific risk drivers and therefore cannot support targeted mitigation planning.

The practical test is to ask: does the pessimistic value in this three-point estimate capture things that might go wrong, or does it capture how the activity will vary even if nothing goes wrong? If the pessimistic value is effectively "the worst realistic case including identifiable risks", the estimator has bundled event risk into variability. The fix is to pull the event risks out into the register where they belong, and recalibrate the three-point estimate to represent only the inherent variability of the activity.

The workshop discipline that makes estimates defensible

A QRA workshop that produces defensible three-point estimates has specific characteristics. It is facilitated by someone experienced enough to challenge anchored thinking and knowledgeable enough to introduce benchmark evidence. It has the right participants — people with real delivery knowledge of the scope in question, not just project management attendees. It is time-boxed per line item: five to ten minutes on each activity, not a token 30 seconds followed by consensus on the base estimate.

The workshop records the reasoning, not just the numbers. For each three-point estimate, the record captures the assumptions about scope, the conditions that would drive low and high outcomes, the benchmark data considered, and any dissenting views. This record is what makes the QRA defensible at gateway review — the reviewer can see not just the inputs but the thinking that produced them. A QRA that cannot show its working is vulnerable to challenge in a way that a fully-documented QRA is not.

The workshop also explicitly discusses correlation. When activity A has a pessimistic value driven by severe weather, and activity B has a pessimistic value driven by the same severe weather, those two pessimistic values are not independent — if one materialises, the other is more likely to. Capturing correlation during elicitation (rather than adding it afterwards) produces a more coherent model and avoids the common failure mode of a QRA with zero correlation that systematically understates tail outcomes.

Red flags in QRA three-point estimates

Three-point estimates worth questioning have specific signatures. Symmetric distributions on nearly every activity (low and high values approximately equidistant from the most-likely) are suspicious — real project activities are typically skewed, with more upside risk than downside. If the register shows symmetric triangulars everywhere, the elicitation was probably driven by anchoring rather than by genuine engagement with the uncertainty.

Narrow distributions on long-duration activities are another flag. An activity with a most-likely duration of 180 days and low/high values of 170/195 has an implied coefficient of variation under 7%, which is unreasonable for most construction and infrastructure work. The range should reflect real experience of how such activities actually perform, not what the planner would find comfortable to claim.

Consistency of range across heterogeneous activities is a third flag. If the three-point estimates show a similar percentage spread on excavation, on M&E commissioning, on software integration and on commissioning, the estimates are almost certainly not being differentiated by activity. Different activities have different risk profiles; their three-point estimates should reflect that.

When SOMA reviews a QRA that exhibits these patterns, we typically rebuild the estimates from the workshop up — not because the model is wrong, but because the inputs driving the model do not represent the uncertainty the project actually faces. The rebuild is expensive in effort but produces a QRA that can survive a gateway challenge, which is the point of doing the work.

Putting it together

Three-point estimate calibration is the part of QRA practice that most directly determines the quality of the output. A well-calibrated set of estimates produces a Monte Carlo distribution that is an honest reflection of the project's risk profile; a poorly-calibrated set produces a number that looks rigorous but isn't. The difference is visible in how the workshop is run — evidence-based, challenging, time-boxed per item, with the reasoning captured alongside the numbers — and in how the reviewer interrogates the outputs.

AACE International's reference standards for this work are PGD-02 (Guide to Quantitative Risk Analysis) and Recommended Practices 57R-09 (Integrated Cost and Schedule Risk Analysis Using Risk Drivers and Monte Carlo Simulation of a CPM Model), 113R-20 (Integrated Cost and Schedule Risk Analysis and Contingency Determination Using Combined Parametric and Expected Value) and 123R-22 (Integrated Cost and Schedule Risk Analysis and Contingency Determination Using Estimate Ranging and Expected Value with Monte Carlo Simulation) — three different methodologies for integrated cost-schedule QRA, each suited to a different evidence and modelling context. QRA that follows these practices is defensible at Gateway, at Board, and in independent assurance review. QRA that skips the calibration discipline to save workshop time tends to produce outputs that do not survive scrutiny.

SOMA facilitates QRA workshops for UK infrastructure, defence, nuclear and public-sector programmes — including the calibration-heavy three-point elicitation that supports CADMID Concept and Assessment-stage business cases. We bring benchmark data, we push back on anchoring, and we capture the reasoning so the outputs are defensible. The result is a QRA that does its job — giving the sponsor a confidence position they can actually fund against, and a risk register they can actually manage.

← Back to guides

More guides

Keep reading.

Guide

The Honest Guide to QRA

What Quantitative Risk Analysis actually is, when you need it, how it works, and how to tell a good one from a bad one.

10 min read

Guide

Monte Carlo Simulation Is Not Magic — What QRA Actually Does (and Doesn't Do)

What Monte Carlo simulation actually is in three sentences, what it does well in QRA, garbage-in-garbage-out, merge bias, correlation, and how to read the S-curve output for a board or finance committee.

9 min read

Guide

QSRA vs QCRA: Meaning, Methodology, and When Each Is the Right Answer

Two of the most important tools in quantitative risk analysis, frequently confused. Here is what each acronym means, how the methodologies differ, what each produces, and how to decide which your programme needs — with worked UK rail, water and nuclear examples.

8 min read

Guide

P50, P80, P95 in Cost Estimation: Which Confidence Level Should You Actually Use?

P50 is the IPA-required central estimate for UK capital cost. P80 is the UK departmental sensitivity convention. P95 is for safety-critical programmes and portfolio-level safeguards. How to pick the right confidence level for project sanction — and what HM Treasury Green Book and IPA Cost Estimating Guidance actually say, versus the working conventions departments use in practice.

9 min read

Strengthening your QRA function?

SOMA delivers quantitative risk analysis to AACE recommended practice — workshop facilitation, three-point calibration, Monte Carlo modelling and reports that survive gateway scrutiny. Independent, tool-agnostic, and written up so a board can act on the number.

Talk to our QRA team QRA service →