Stat-Ease Blog

Blog

Good Enough is Great: Why the Simpler Model Might Be Best

posted by Stat-Ease Team on April 15, 2026

(Adapted from Mark Anderson’s 2023 webinar “Selecting a Most Useful Predictive Model”)

There can be a moment when analyzing your response surface method (RSM) experiment that you feel let down. You designed it carefully, maybe as a central composite design built specifically to capture curvature via a quadratic model, but when the results come in, the fit statistics tell you that a linear model fits just fine—no curves needed.

At this point you probably feel cheated. You paid for quadratic, but you only got linear. Now you have to recognize that's not a failure: that's the experiment doing its job.

Designed for Quadratic, Fitted with Less

When George Box and K.B. Wilson developed the central composite design back in 1951, they built it to estimate a full quadratic model: main effects, two-factor interactions, and squared terms that let you map response peaks, valleys, and saddle points. It's a powerful structure, and for many process optimization problems you'll need every bit of it. But not always.

Take a typical study with three factors: say, reaction time, temperature, and catalyst concentration; and two responses to optimize, for example, conversion (yield) and activity. Fit the conversion response, and the quadratic earns its keep. The squared terms are significant, and curvature is real. You get a rich surface to work with. Satisfying.

Then you turn to activity. You run through the same fitting sequence: check the mean, add linear terms, layer in two-factor interactions, and try the quadratic, but the data keeps saying “no thank you” at each step beyond linear. The sequential p-values tell a clear story: main effects matter, but the added complexity contributes nothing.

The right answer isn't to force a quadratic model because that's what you designed for. Use the linear model. That's what the data supports.

Simpler Models Are Easier to Trust

A more parsimonious model—statistician-speak for "simpler, with fewer unnecessary terms"—has real advantages beyond just passing significance tests. Every term you add raises the risk of overfitting: chasing noise instead of signal. A model stuffed with insignificant terms can look impressive on paper while quietly falling apart when you try to predict new results.

The major culprit for bloated models is the R-squared (R²) statistic that most scientists tout as a measure of how well they fitted their results. Unfortunately, R² in its raw form is a very poor quality-indicator for predictive models because it climbs whenever you add a term, regardless of whether it means anything. It is far better to use a more refined form of this statistic called “predicted” R², which estimates how well your model will perform on data it hasn't seen yet.

Trim the insignificant terms from a bloated model and you'll often see predicted R² go up, even as raw R² dips slightly. That's a good sign. For a good example of this counterintuitive behavior of R²s, check out this Stat-Ease software table showing the fit statistics on activity fit by quadratic versus linear models:

Fit statistics on activity fit by quadratic versus linear models
Activity (quadratic) Activity (linear)
Std. Dev. 1.08 0.9806
Mean 60.23 60.23
C.V. % 1.79 1.63
0.9685 0.9564
Adjusted R² 0.9370 0.9477
Predicted R² 0.7696 0.9202
Adeq Precision 18.2044 29.2274
Lack of Fit (p-values) 0.3619 0.5197

By the way, if you have Stat-Ease software installed, you can easily reproduce these results by opening the Chemical Conversion tutorial data (accessible via program Help) and, via the [+] key on the Analysis branch, creating these alternative models. This is a great way to work out which model will be most useful. Don’t forget, all else equal, the simpler one is always best—easier to explain with fewer terms to tell a cleaner story.

Here's a guiding principle: if adjusted R² and predicted R² differ by more than 0.2, try reducing your model. Bringing those two statistics closer together is usually a sign you're moving in the right direction.

So, When Do You Stop Tweaking?

This is where a lot of practitioners get into trouble—not by underfitting, but by endlessly refitting. There's always another criterion to check, another comparison to agonize over. Beware of “paralysis by analysis”!

George Box said it well: all models are wrong, but some are useful. The goal isn't a perfect model. The goal is a useful one. Here's how you know when you’ve made a good choice:

Check adequate precision. This statistic measures signal-to-noise ratio: anything above 4 is generally good. Strong adequate precision alongside reasonable R² values usually means you have enough model to work with, even if lack of fit is technically significant. (Lack-of-fit can mislead you, particularly when center-point replicates are run by highly practiced hands who nail that standard condition every time, giving you an artificially tight estimate of pure error.)

Look at your diagnostics, but don't over-interpret them. The top three are the normal plot of residuals, residuals-versus-run, and the Box-Cox plot for potential transformations. On the normal plot, apply the “fat pencil” test: if you can cover the points with a broad marker held along the line, you're fine. You're looking for a dramatic S-shape or an obvious outlier, not minor wobbles.

Try the algorithmic reduction, then compare. Stat-Ease software offers automatic model reduction tools. Run it, compare the reduced model to the full model on predicted R² and adequate precision, and make a judgment call. If the statistics are similar and the model is simpler, take it.

Then press ahead. Once you've checked your fit statistics, run your diagnostics, and done a sensible reduction, go use the model! You can always get a second opinion (Stat-Ease users can request one from our StatHelp team), but at some point the model is good enough. That's the whole point.

The Liberating Truth

There's something freeing about accepting a linear model from an experiment designed for a quadratic. It means your process is well-behaved in that region, easy to interpret and likely to predict well. Now you can get on with finding the conditions that meet your experimental goals—a process that hits the sweet spot for quality and cost at robust operating conditions.,

The experiment isn't a failure when it gives you something simpler than expected. It's doing exactly what a good experiment should do: telling you the truth.


Like the blog? Never miss a post - sign up for our blog post mailing list.


Hear Ye, Hear Ye: A Response Surface Method (RSM) Experiment on Sound Produces Surprising Results

posted by Mark Anderson on Feb. 23, 2026

A few years ago, while evaluating our training facility in Minneapolis, I came up with a fun experiment that demonstrates a great application of RSM for process optimization. It involves how sound travels to our students as a function of where they sit. The inspiration for this experiment came from a presentation by Tom Burns of Starkey Labs to our 5th European DOE User Meeting. As I reported in our September 2014 Stat-Teaser, Tom put RSM to good use for optimizing hearing aids.

Background

Classroom acoustics affect speech intelligibility and thus the quality of education. The sound intensity from a point source decays rapidly by distance according to the inverse square law. However, reflections and reverberations create variations by location for each student—some good (e.g., the Whispering Gallery at Chicago Museum of Science and Industry—a very delightful place to visit, preferably with young people in tow), but for others bad (e.g., echoing). Furthermore, it can be expected to change quite a bit from being empty versus fully occupied. (Our then-IT guy Mike, who moonlights as a sound-system tech, called these—the audience, that is—“meat baffles”.)

Sound is measured on a logarithmic scale called “decibels” (dB). The dBA adjusts for varying sensitivities of the human ear.

Frequency is another aspect of sound that must be taken into account for acoustics. According to Wikipedia, the typical adult male speaks at a fundamental frequency from 85 to 180 Hz. The range for a typical adult female is from 165 to 255 Hz.

Procedure


Photograph of the old Stat-Ease training room with bright yellow cups at even distances on the tables.

Stat-Ease training room at one of our old headquarters—sound test points spotted by yellow cups.

This experiment sampled sound on a 3x3 grid from left to right (L-R, coded -1 to +1) and front to back (F-B, -1 to +1)—see a picture of the training room above for location—according to a randomized RSM test plan. A quadratic model was fitted to the data, with its predictions then mapped to provide a picture of how sound travels in the classroom. The goal was to provide acoustics that deliver just enough loudness to those at the back without blasting the students sitting up front.

Using sticky notes as markers (labeled by coordinates), I laid out the grid in the Stat-Ease training room across the first 3 double-wide-table rows (4th row excluded) in two blocks:

  1. 2² factorial (square perimeter points) with 2 center points (CPs).
  2. Remainder of the 32 design (mid-points of edges) with 2 additional CPs.

I generated sound from the Online Tone Generator at 170 hertz—a frequency chosen to simulate voice at the overlap of male (lower) vs female ranges. Other settings were left at their defaults: mid-volume, sine wave. The sound was amplified by twin Dell 6-watt Harman-Kardon multimedia speakers, circa 1990s. They do not build them like this anymore 😉 These speakers reside on a counter up front—spaced about a foot apart. I measured sound intensity on the dBA scale with a GoerTek Digital Mini Sound Pressure Level Meter (~$18 via Amazon).

Results

I generated my experiment via the Response Surface tab in Design-Expert® software (this 3³ design shows up under "Miscellaneous" as Type "3-level factorial"). Via various manipulations of the layout (not too difficult), I divided the runs into the two blocks, within which I re-randomized the order. See the results tabulated below.

Table of results from the sound experiment.
Block Run Space Type Coordinate (A: L-R) Coordinate (B: F-B) Sound (dBA)
1 1 Factorial -1 1 70
1 2 Center 0 0 58
1 3 Factorial 1 -1 73.3
1 4 Factorial 1 1 62
1 5 Center 0 0 58.3
1 6 Factorial -1 -1 71.4
1 7 Center 0 0 58
2 8 CentEdge -1 0 64.5
2 9 Center 0 0 58.2
2 10 CentEdge 0 1 61.8
2 11 CentEdge 0 -1 69.6
2 12 Center 0 0 57.5
2 13 CentEdge 1 0 60.5

Notice that the readings at the center are consistently lower than around the edge of the three-table space. So, not surprisingly, the factorial model based on block 1 exhibits significant curvature (p<0.0001). That leads to making use of the second block of runs to fill out the RSM design in order to fit the quadratic model. I was hoping things would play out like this to provide a teaching point in our DOESH class—the value of an iterative strategy of experimentation.

The 3D surface graph shown below illustrates the unexpected dampening (cancelling?) of sound at the middle of our Stat-Ease training room.


3D Plot of the response surface from Stat-Ease 360 software

3D surface graph of sound by classroom coordinate.

Perhaps this sound ‘map’ is typical of most classrooms. I suppose that it could be counteracted by putting acoustic reflectors overhead. However, the minimum loudness of 57.4 (found via numeric optimization and flagged over the surface pictured) is very audible by my reckoning (having sat in that position when measuring the dBA). It falls within the green zone for OSHA’s decibel scale, as does the maximum of 73.6 dBA, so all is good.

What next

The results documented here came from an empty classroom. I would like to do it again with students (aka meat baffles) present. I wonder how that will affect the sound map. Of course, many other factors could be tested. For example, Rachel from our Front Office team suggested I try elevating the speakers. Another issue is the frequency of sound emitted. Furthermore, the oscillation can be varied—sine, square, triangle and sawtooth waves could be tried. Other types of speakers would surely make a big difference.

What else can you think of to experiment on for sound measurement? Let me know.


Like the blog? Never miss a post - sign up for our blog post mailing list.


Are optimal response surface method (RSM) designs always the optimal choice?

posted by Richard Williams on Feb. 10, 2026

Most people who have been exposed to design of experiment (DOE) concepts have probably heard of factorial designs—designs that target the discovery of factor and interaction effects on their process. But factorial designs are hardly the only tool in the shed. And oftentimes to properly optimize our system a more advanced response surface design (RSM) will prove to be beneficial, or even essential.

This is the case when there is “curvature” within the design space, suggesting that quadratic (or higher) order terms are needed to make valid predictions between the extreme high/low process factor settings. This gives us the opportunity to find optimal solutions that reside in the interior of the design space. If you include center points in a factorial design, you can check for non-linear behavior within the design space to see if an RSM design would be useful (1). But which RSM options should you pick?

Let’s start by introducing the Stat-Ease® software menu options for RSM designs. Once we understand the alternatives we can better understand when which might be most useful for any given situation and why optimal designs are great—when needed.

  • First on the list is the central composite design (our software default)
  • Next is the Box-Behnken design
  • And third is something called optimal design

Stat-Ease 360 software screenshot showing the design selection panel, with the RSM designs expanded.

Stat-Ease software design selection options

The natural question that often pops up is this. Since optimal designs are third on our list, are we defaulting to suboptimal designs? Let’s dig in a bit deeper.

The central composite design (“CCD”) has traditionally been the workhorse of response surface methods. It has a predictable structure (5 levels for each factor). It is robust to some variations in the actual factor settings, meaning that you will still get decent quadratic model fits even if the axial runs have to be tweaked to achieve some practical values, including the extreme case when the axial points are placed at the face of the factorial “cube” making the design a 3-level study. A CCD is the design of choice when it fits the problem and generally creates predictive models that are effective throughout the design space--the factorial region of the design. Note that the quadratic predictive models generally improve when the axial points reside outside the face of the factorial cube.

When a 5-level study is not practical, for example, if we are looking at catalyst levels and the lower axial point would be zero or a negative number, we may be forced to bring the axial points to the face of the factorial cube. When this happens, Box-Behnken designs would be another standard design to consider. It is a 3-level design that is laid out slightly differently than a CCD. In general, the Box-Behnken results in a design with marginally fewer runs and is generally capable of creating very useful quadratic predictive models.

These standard designs are very effective when our experiments can be performed precisely as scripted by the design template. But this is not always the case, and when it is not we will need to apply a more novel approach to create a customized DOE.

Optimal designs are “custom” creations that come in a variety of alphabet-soup flavors—I, D, A, G, etc. The idea with optimal designs is that given your design needs and run-budget, the optimization algorithm will seek out the best choice of runs to provide you with a useful predictive model that is as effective as possible. Use of the system defaults when creating optimal designs is highly advised. Custom optimal designs often have fewer runs than the central composite option. Because they are generated by a computer algorithm, the number of levels per factor and the positioning of the points in the design space may be unique each time the design is built. This may make newcomers to optimal designs a bit uneasy. But, optimal designs fill the gap when:

  • The design space is not “cuboidal”— there are constraints on the operating region that make the design space lopsided or truncated.
  • There are categoric or discrete numeric factors to deal with.
  • The expected polynomial model is something other than a full quadratic.
  • You are trying to augment an existing design to expand the design space or to upgrade to a higher order model.

The classic designs provide simple and robust solutions and should always be considered first when planning an experiment. However, when these designs don’t work well because of budget or practical design space constraints, don’t be afraid to go “outside the box” and explore your other options. The goal is to choose a design that fits the problem!

Acknowledgement: This post is an update of an article by Shari Kraber on “Modern Alternatives to Traditional Designs Modern Alternatives to Traditional Designs" published in the April 2011 STATeaser.

(1) See Shari Kraber’s blog post, “"Energize Two-Level Factorials - Add Center Points!” from August. 23, 2018 for additional insights.


Like the blog? Never miss a post - sign up for our blog post mailing list.


Publication Roundup January 2025

posted by Rachel Poleke, Mark Anderson on Feb. 3, 2025

Welcome to our first Publication Roundup! In these monthly posts, we'll feature recent papers that cited Design-Expert® or Stat-Ease® 360 software. Please submit your paper to us if you haven't seen it featured yet!

Mark's comment: make sure to check out publication #4 by researchers from GITAM School of Science in Hyderabad, India. They provide all the raw data, the ANOVAs, model graphs and, most importantly, enhancing the quality of medicines via multifactor design of experiments (DOE).

  1. Innovative study on chalcopyrite flotation efficiency with xanthate and ester collectors blend using response surface methodology (B.B.D): towards sustainability
    Scientific Reports volume 15, Article number: 65 (2025)
    Authors: Imkong Rathi & Shravan Kumar
  2. Fabrication and In Vivo Evaluation of In Situ pH-Sensitive Hydrogel of Sonidegib–Invasomes via Intratumoral Delivery for Basal Cell Skin Cancer Management
    Pharmaceuticals 2025, 18(1), 31
    Authors: Maha M. Ghalwash, Amr Gamal Fouad, Nada H. Mohammed, Marwa M. Nagib, Sherif Faysal Abdelfattah Khalil, Amany Belal, Samar F. Miski, Nisreen Khalid Aref Albezrah, Amani Elsayed, Ahmed H. E. Hassan, Eun Joo Roh, & Shaimaa El-Housiny
  3. Formulation development and evaluation, in silico PBPK modeling and in vivo pharmacodynamic studies of clozapine matrix type transdermal patches
    Scientific Reports volume 15, Article number: 1204 (2025)
    Authors: Abdul Qadir, Syed Umer Jan, Muhammad Harris Shoaib, Muhammad Sikandar, Rabia Ismail Yousuf, Fatima Ramzan Ali, Fahad Siddiqui, Abdul Jabbar Magsi, Ghulam Mustafa, Muhammad Talha Saleem, Shafi Mohammad, Mohammad Younis & Muhammad Arsalan
  4. Unique Research for Developing a Full Factorial Design Evaluated Liquid Chromatography Technique for Estimating Budesonide and Formoterol Fumarate Dihydrate in the Presence of Specified and Degradation Impurities in Dry Powder Inhalation
    Biomedical Chromatography: Volume 39, Issue 2, February 2025
    Authors: Lova Gani Raju Bandaru, Naresh Konduru, Leela Prasad Kowtharapu, Rambabu Gundla, Phani Raja Kanuparthy, Naresh Kumar Katari
  5. Synergistic effects of fly ash and graphene oxide composites at high temperatures and prediction using ANN and RSM approach
    Scientific Reports volume 15, Article number: 1604 (2025)
    Authors: I. Ramana & N. Parthasarathi
  6. Enhancement Strategy for Protocatechuic Acid Production Using Corynebacterium glutamicum with Focus on Continuous Fermentation Scale-Up and Cytotoxicity Management
    International Journal of Molecular Sciences 2025, 26(1), 396
    Authors: Jiwoon Chung, Wooshik Shin, Chulhwan Park, and Jaehoon Cho
  7. An exploration of RSM, ANN, and ANFIS models for methylene blue dye adsorption using Oryza sativa straw biomass: a comparative approach
    Scientific Reports volume 15, Article number: 2979 (2025)
    Authors: Sheetal Kumari, Smriti Agarwal, Manish Kumar, Pinki Sharma, Ajay Kumar, Abeer Hashem, Nouf H. Alotaibi, Elsayed Fathi Abd-Allah & Manoj Chandra Garg
  8. Manipulated Slow Release of Florfenicol Hydrogels for Effective Treatment of Anti-Intestinal Bacterial Infections
    International Journal of Nanomedicine, Volume 2025:20, Pages 541—555, 13 January 2025.
    Authors: Luo W, Zhang M, Jiang Y, Ma G, Liu J, Dawood AS, Xie S, Algharib SA
  9. Preparation of slow-release fertilizer derived from rice husk silica, hydroxypropyl methylcellulose, polyvinyl alcohol and paper composite coated urea
    Heliyon, Volume 11, Issue 2, 30 January 2025
    Authors: Idayatu Dere, Daniel T. Gungula, Semiu A. Kareem, Fartisincha Peingurta Andrew, Abdullahi M. Saddiq, Vadlya T. Tame, Haruna M. Kefas, David O. Patrick, Japari I. Joseph
  10. Elimination of Ni(II) from wastewater using metal-organic frameworks and activated algae encapsulated in chitosan/carboxymethyl cellulose hydrogel beads: Adsorption isotherm, kinetic, and optimizing via Box-Behnken design optimization
    International Journal of Biological Macromolecules, 21 January 2025, In Press, Journal Pre-proof
    Authors: Gamil A.A.M. Al-Hazmi, Nadia H. Elsayed, Jawza Sh. Alnawmasi, Khadra B. Alomari, Ali Hamzah Alessa, Shareefa Ahmed Alshareef, A.A. El-Bindary
  11. QbD-Driven preparation, characterization, and pharmacokinetic investigation of daidzein-l oaded nano-cargos of hydroxyapatite
    Scientific Reports volume 15, Article number: 2967 (2025)
    Authors: Namrata Gautam, Debopriya Dutta, Saurabh Mittal, Perwez Alam, Nasr A. Emad, Mohamed H. Al-Sabri, Suraj Pal Verma & Sushama Talegaonkar
  12. Lubricity potentials of Azadirachta indica (neem) oil and Cyperus esculentus (tiger nut) oil extracts and their blends in machining of mild steel material
    Heliyon, Volume 11, Issue 2, 30 January 2025
    Authors: Ignatius Echezona Ekengwu, Ikechukwu Geoffrey Okoli, Obiora Clement Okafor, Obiora Nnaemeka Ezenwa, Joseph Chikodili Ogu
  13. Process Evaluation and Analysis of Variance of Rice Husk Gasification Using Aspen Plus and Design Expert Software
    Chemistry Africa (2025)
    Authors: Ernest Mbamalu Ezeh, Isah Yakub Mohammed, Epere Aworabhi, Yousif Abdalla

The Importance of Center Points in Central Composite Designs

posted by Pat Whitcomb on March 10, 2023

A central composite design (CCD) is a type of response surface design that will give you very good predictions in the middle of the design space.  Many people ask how many center points (CPs) they need to put into a CCD. The number of CPs chosen (typically 5 or 6) influences how the design functions.

Two things need to be considered when choosing the number of CPs in a central composite design:

1)  Replicated center points are used to estimate pure error for the lack of fit test. Lack of fit indicates how well the model you have chosen fits the data.  With fewer than five or six replicates, the lack of fit test has very low power.  You can compare the critical F-values (with a 5% risk level) for a three-factor CCD with 6 center points, versus a design with 3 center points.  The 6 center point design will require a critical F-value for lack of fit of 5.05, while the 3 center point design uses a critical F-value of 19.30.  This means that the design with only 3 center points is less likely to show a significant lack of fit, even if it is there, making the test almost meaningless.

TIP: True “replicates” are runs that are performed at random intervals during the experiment. It is very important that they capture the true normal process variation! Do not run all the center points grouped together as then most likely their variation will underestimate the real process variation.

2)  The default number of center points provides near uniform precision designs.  This means that the prediction error inside a sphere that has a radius equal to the ±1 levels is nearly uniform. Thus, your predictions in this region (±1) are equally good.  Too few center points inflate the error in the region you are most interested in.  This effect (a “bump” in the middle of the graph) can be seen by viewing the standard error plot, as shown in Figures 1 & 2 below. (To see this graph, click on Design Evaluation, Graph and then View, 3D Surface after setting up a design.)

3D evaluation graph of a design with 6 center points.

3D evaluation graph of a design with 3 center points.













Figure 1 (left): CCD with the 6 center points (5-6 recommended). Figure 2 (right): CCD with only 3 center points. Notice the jump in standard error at the center of figure 2.

Ask yourself this—where do you want the best predictions? Most likely at the middle of the design space. Reducing the number of center points away from the default will substantially damage the prediction capability here! Although it can seem tedious to run all of these replicates, the number of center points does ensure that the analysis of the design can be done well, and that the design is statistically sound.