|
>
Alignment
> Reliability
> Persuasiveness
> Organizational Fit
Reliability
Model reliability means that the model accurately describes
the possible outcomes of a potential action, and the reasons
for them. In other words, a reliable model gives the right
answer for the right reasons in all potential operating conditions.
Though reliability cannot be proven except in hindsight, one
key question can be asked of a model in advance: are the model
assumptions, and the output calculated from them, consistent
with everything that is known about the real situation? That
is, does the model behavior conform to historical data, expert
knowledge, and common sense? Ventana's tools and approach
are designed to ensure that Ventana models are consistent
with all available information, both during development and
over time.
By extensive triangulation among data, informed expectations,
and causal hypotheses, Ventana first cross-checks all sources
of understanding against one another for consistency and completeness.
Vensim® Causal Tracing® speeds the tracing of inconsistencies
and gaps to their roots for fast resolution. This process
refines the knowledge base on which the model is built, ensuring
that decisions are based on the best information possible.
(For more information, please see Ventana Modeling Techniques.)
Ventana continually checks the developing model for self-consistency
and for consistency with the refined information, using seven
tests of reliability which can be used with any model:
1. Fit to historical data: This test measures how closely model outputs match historical data when given true historical inputs. Ventana employs statistical measures appropriate to each situation to understand when discrepancies between model output and historical data are due to data noise, and when they are signals of gaps in model logic. While a model must meet this standard, this should not be the only test--it is too easy.
2. Predictive fit to historical periods: One way to detect model errors is, for example, to calibrate the model using data through 1999, predict the outcomes for the year 2000, and compare predicted results to actual values. Understanding the causes of any discrepancies often reveals key gaps in model structure. This technique is also useful for comparing the predictive power of a new model to existing methods of decision analysis. (Once structural gaps have been fixed, however, this is no longer a fair comparison, since the later data have been used to guide the model development.)
3. Units of measure: Models are, in the end, collections of mathematics, expressed in equations. One tenet of applied mathematics is that every equation must have consistent units (compare "apples to apples"). While this requirement is straightforward, many common business analysis tools, such as spreadsheets, have no facility for catching unit errors. As a result it is easy for errors to go undetected, potentially skewing results. The Vensim modeling environment automatically checks units throughout the model. Enforcing consistent units also helps clarify the definitions of new concepts.
4. Physical conservation: Another straightforward requirement is that strict accounting must be observed for all physical quantities. Ventana models explicitly track stocks of people, things, and money, as well as the flows which increase or deplete them, to assure that resources do not appear or vanish inexplicably. Despite the logic of this requirement, models that inadvertently violate this law of conservation are extremely common.
5. Real world causality: In Ventana models, each variable has a clear real-world interpretation, and the equation calculating the change of each variable over time describes not only how much it changes, but why. The causal story described by each equation must be correct. If model output matches past results, but for reasons known to be wrong, future predictions by that model cannot be trusted. During development, Ventana cross-checks each causal assumption with client experts and available data in order to find and fix inconsistencies.
6. Sensible behavior in all conditions - Reality Checks®: A model should always produce sensible results, and VensimŪ Reality CheckŪ can be used to test for sensible behavior automatically. Reality Check is a library of descriptions of sensible behavior such as "if we raise prices, sales volume should decrease," which can be tested in the model on command. Reality Check provides an excellent way for subject experts to critique a model, without having to know anything about the model assumptions or techniques: they can simply provide statements of how the model ought to behave under certain circumstances. These are stored as Reality Check statements and model behavior can be checked against them at any time. When the model fails a Reality Check test, it means either that the model must be revised, or that the definition of "sensible behavior" is vague. (For instance, in the price-volume example, if the total market is growing, raising prices will reduce market share but sales volume may still increase.) Reality Check libraries also include expert technical knowledge (e.g., the known response of a chemical compound to a certain stimulus), and common sense thinking about extreme conditions (e.g., the output of a production line if there is zero workforce). Reality Check easily tests the model against the entire library on command and reports discrepancies. As these are resolved, the model and the knowledge base remain mutually consistent over time, re-enforcing the reliability of decisions based on them.
A note on extreme conditions:
In certain extreme conditions, results can be predicted without doubt. For example, an extreme-condition Reality Check test appropriate to a factory might be "if there are no workers, production should be zero." These extreme examples often seem so obvious and trivial as to not be worth mentioning -- but they are very important. Ventana models represent the connections among variables in all circumstances. This assures that production (in the example) would respond appropriately to any change in workforce, no matter how large. Many existing models, particularly linear regressions, would fail this test and report continued production even with no workforce. This failure is often dismissed with the argument that such models give correct results within the organization's normal operating regime. However, most Ventana clients are trying to move out of their normal operating regime and into a better one. Furthermore, relying on a model that cannot handle the full range of possibilities sets up an organization to be blindsided by new events, making their model irrelevant just at the time that strategic guidance is most needed. For both these reasons, a model that can only handle small changes is unlikely to be helpful.
7. Uniqueness: It sometimes happens that two models, based on different but equally plausible causal assumptions, pass all of the tests above. When this happens, the two models will generally predict different outcomes for a particular action. More frequently, a model will fit history equally well with different possible settings for some of its parameters. In this situation, Ventana models help organizations to pinpoint the additional knowledge or data that must be collected to determine which model is correct. They also report the resulting uncertainty, and help organizations to design an appropriate hedge.
Over
time, routine automated checks can be performed to compare
the model to the growing volume of accumulated data and to
the growing Reality Check library. Consistency with more and
more data and knowledge increases the reliability of the model.
And should the model, data, and knowledge base ever get out
of alignment, these checks instantly alert users to the discrepancy.
This gives early warning that some aspect of the organization's
understanding may need to be revised, and points to the factors
in question.
A model which passes all of these tests is consistent with
everything that is known about the reality of the organization
and its business: what has happened, what should happen, and
why. That is the best that can be said of any basis for making
a decision. Passing these tests does not guarantee that a
model is reliable - but failing them guarantees that it is
not.
|