A few weeks ago, a process engineer hoping to glean a model of yield as function of 8 factors asked me to explain the failure by analysis of variance (ANOVA) to produce p values. See this deficiency on the left side of the software output shown in Figure 1. On the right side notice the dire warning about the fit statistics. The missing p’s and other non-available (“NA”) stats created great concern about the validity of the entire analysis.
Figure 1: Alarming results in ANOVA and fit statistics
The tip-off for what went wrong can be found in the footnote: “Case(s) with leverage of 1.” After poring over the inputs, which stemmed from existing data—not from a designed experiment, I discovered that many of the rows had been duplicated. Removing these ‘dups’ left only 9 unique runs to fit a linear model featuring 8 coefficients for the 8 factors (main-effect slopes) plus 1 coefficient required for the intercept. The statistical software did the best it could from this ‘mission impossible.’ It did nothing wrong.
Creating total leverage as in this multifactor case can be likened to fitting a line to two points. It leaves no degrees of freedom (df) for estimating error (see this shown in the Figure 1 ANOVA). Thus, the F-test cannot be performed and, therefore, no p values can be estimated.
A model can be generated (barely!), but the lack of statistical tests provides no confidence in the outcome, literally (zero).
The remedy is very simple: Collect more data!
Leverage is a numerical value between 0 and 1 that indicates the potential for a design point to influence the model fit. It’s strictly a function of the design itself—not the responses. Thus, leverage can be assessed before running the experiment.
A leverage of 1 means that the model will exactly fit the observation. That is never good because, unless that point falls exactly where it ought to be, your predictive model will be off kilter.
Leverage (“L”) is an easy statistic to master. It equals the number of coefficients in your model divided by the number of unique runs (dups do not count!).
You have seen what happens when all the runs are completely leveraged (L=1). But even one run at a leverage of 1 creates issues. For example, consider a hypothetical experiment aimed at establishing a linear fit of a key process attribute Y on a single factor X. The researchers intend to make 20 runs at two levels. However, due to circumstances beyond their control, they only achieve one run at high level. The 10 points at the low end come in at a leverage of 0.1 each, so none of them individually create much influence on the fitting. That’s good. But the single point at the highest level exhibits a leverage of 1, so it will be exactly fitted wherever it may go. That’s not good, but it may be OK if the result is where it ought to be. However, if something unusual happens at high level, there will be no way of knowing. I would be very skeptical of such an experiment—best to go for a complete ‘do over.’
Watch for leverages close to 1.0. Consider replicating these points, or make sure they are run very carefully.
Some designs, such as standard two-level factorials with no center points, produce runs with equal leverage. However, others do not. For example, a two-level design on 4 factors with 4 center points features 16 runs with a leverage of 0.9875—far exceeding the center-point leverages of 0.05. Nevertheless, applying generally accepted guidelines that leverages less than 2 times the average cause no great concern, this design gets a pass—the average leverage being 0.8. A two-level design with center points is like a teeter-totter, points at the center are at the fulcrum and thus create very low leverage.
I advise you focus only on runs with leverage greater than 2 times the average leverage (or any with leverage of 1, of course). It is best to identify high-leverage points before running the experiment via a design evaluation and, if affordable, replicate them, thus reducing their leverage.
Do not be greatly concerned if leverages get flagged after you reduce insignificant terms from your model. For example, see the case study by our founder Pat Whitcomb in his article on “Bad Leverages” in the March 1998, Stat-Teaser—a must read if you want to get a good grasp on leverage.
Keep in mind that, despite being flagged for high leverage (2x average), a design point may generate a response that typifies how the process behaves at that setting. In that case it does not invalidate the model. Apply your subject matter and/or ask an expert colleague to be the judge of that.
If you use standard DOE templates or optimal tools to lay out an experiment, it is unlikely that your design will include points with leverage over twice the average leverage. But, if you override the defaults and warnings in your software, issues with leverage can arise. For example, I often see published factorial designs with only 1 center point—not the 3 or 4 that our software advises. This creates a leverage of 1 for the curvature test—not good. Believe it or not, as a peer reviewer for a number of technical journals I’ve also seen many manuscripts that lay out the recommended number of center points for standard designs (e.g., 4 for a two-level factorial). But they all show the same results. As already explained, when it comes to leverage do not be duped by ‘dups.’
I am particularly wary of historical data with runs done haphazardly (no plan). These often create a cloud of points at one end with very few at the opposite extreme. For example, see the scatter plot in Figure 2 (real data from a study of infection rates after varying number of days at various hospitals in the USA).
Figure 2: A real-life dataset with a badly leveraged point
In this case, the point at the upper right exhibits a leverage of 0.99 versus all the other 12 points averaging 0.17. If possible, replicating such a high-leverage point would be very helpful, thus reducing its leverage by half. Better yet, do two more replicates to reduce this problematic point’s leverage by one-third. Though not emerging as an outlier in the diagnostics (very unlikely for a highly leveraged point--it will be closely fitted), this particular result must be carefully evaluated and ignored if determined to be exceptional.
Pay attention to leverage, ideally before you complete your experiment, but if you are developing a model from existing data, do so in the diagnostics from your statistical software. Beware of totally leveraged runs—this being the worse-case scenario. If not quite this bad, watch for leverages more than twice the average—if possible, replicate them. Otherwise, apply engineering and scientific expertise to decide if the results can be accepted.