Gaussian Process Models (Stat-Ease 360® only)

Note

Gaussian process models are only available for Stat-Ease 360 and they are not available for split-plot designs or designs that include blocks or other categorical factors.

Introduction

Gaussian process regression is a technique to fit multivariate factor data to a response. When appropriate, the resulting Gaussian process model (GPM) can be used to infer a functional relationship between response observations and numeric factor settings.

Quick Guidelines

Below are some guidelines to keep in mind when deciding if a Gaussian process model is appropriate for your response:

  • Prefer a Polynomial/Factorial Model: Instead of a Gaussian process model, prefer a polynomial or factorial model if possible. Polynomial and factorial models are time-tested and preserve all the usual model equations, transformations, statistical tests, and diagnostics. GPMs are essentially large matrices, and not easily summarized by a simple equation.

  • Noiseless Simulations: If your response is the result of a simulation that produces the exact same response value for the same factor settings (noiseless or zero-error simulations), consider a zero-error Gaussian process model.

  • Noisy Simulations: If your response is the result of a simulation that includes simulated noise added to the response value or noisy replicates, consider a noisy Gaussian process model.

  • Manual Transformations Required: Responses with observations that vary over several orders of magnitude may need to be manually transformed before importing to Stat-Ease 360 and fitting a Gaussian process model.

  • Fit Based on Coded Factor Data: The Gaussian process regression is performed on the coded factor data only.

Builds

Designs with space-filling factor settings, such as Latin Hypercube (LHC) or optimal distance designs, provide the best support (information over the factor space) for GPMs [SWN03]. Consider using one of these designs when planning an experiment that will use a Gaussian process model.

../../../../_images/space-filling-designs.PNG

Space-filling designs support Gaussian process models.

Configuration

../../../../_images/configuration.PNG
  • Gaussian Process (Zero-Error): Zero-error (noiseless) Gaussian process regression is best used in cases where the response is a deterministic function of the factor inputs. For this model, the noise parameter is set to zero and the overall-scale parameter is set to one. Neither are reported on the Gaussian Process Statistics tab.

  • Gaussian Process (Noisy): When repeated runs correspond to different response values or when the simulation includes simulated noise, a noisy Gaussian process model may be more appropriate. For this model, the noise parameter is allowed to be non-zero.

Gaussian Process Factors Tab

The method for selecting the Gaussian process parameters can be selected on the Factors tab:

../../../../_images/factors-tab.PNG
  • Manual Input: Enter the smoothing parameter manually (not recommended). For noisy Gaussian process, the noise and overall scale parameters are calculated by Maximum Likelihood.

  • Maximum Likelihood: Optimize the Gaussian process parameters by maximizing the log likelihood function. This is the preferred method of choosing the Gaussian process parameters.

  • Cross Validation: Optimize the Gaussian process parameters by maximizing the sum of leave-one-out log likelihood functions. In some cases, this may result in a fit that is more robust to model misspecification [RW06].

Gaussian Process Statistics

../../../../_images/gaussian-process-statistics.PNG

Gaussian Process Statistics tab (noisy Gaussian Process example)

See below for a description of each of the fields on the Gaussian Process Statistics tab.

Smoothing Parameter

The smoothing parameter is a dimensionless quantity characterizing how fast the response varies over the coded factor space. Larger values for the smoothing parameter correspond to a response that is more slowly varying (fewer wiggles and dimples) than a smaller smoothing parameter. A smoothing parameter that is too large may result in overfitting and too small a smoothing parameter may result in unrealistic interpolated values. Maximum Likelihood is the recommended method to use for selecting this parameter.

Noise Parameter

This noise parameter is a dimensionless quantity that characterizes the noise present in the response. See the Overall Noise for an estimate in response units. Maximum Likelihood is the recommended method to use to select this parameter. Larger values of the noise parameter for a fixed Overall Scale will result in a larger spread of observed response values about the mean. The noise parameter is not reported for zero-error Gaussian process models.

Overall Scale

The overall scale characterizes the variation of the predicted response in the vertical direction. Its value is automatically calculated. The overall scale is not reported for zero-error Gaussian process models.

Overall Noise

This value characterizes the spread of observed response values about the predicted mean and is analogous to a standard deviation for a polynomial fit.

Log Likelihood

The log likelihood for the Gaussian process regression. This is the quantity maximized when the Gaussian process parameters are selected using Maximum Likelihood.

CV Log Likelihood

The cross validation (CV) log likelihood for the Gaussian process regression. This value is computed by leave-one-out cross validation and is the quantity maximized when the Gaussian process parameters are selected using Cross Validation.

R Squared

This R Squared value or coefficient of determination is a single statistic summary of the fraction of variation explained by the Gaussian process regression. For zero-error Gaussian process models, \(R^2\) will be 1 unless there are numerical problems or noisy replicates in the design.

See the Mathematical Detail section for more information.

Gaussian Process Diagnostics

The predictions and raw residuals are plotted and reported on the Diagnostics tab.

../../../../_images/diagnostics.PNG

Diagnostics tab (noisy Gaussian process example)

For zero-error Gaussian process models, residuals should be very small relative to the size of the response and all points should fall on the 45-degree line of the Predicted vs Actual graph (unless there are noisy replicates).

For noisy Gaussian process models, the residuals should lie approximately evenly around zero and the points on the Predicted vs Actual graph should fall evenly around the 45-degree line.

Preferences

If there are problems with the fit, certain preference settings may improve it.

../../../../_images/gaussian-process-options.PNG

Random restarts: This controls the number of random starting locations attempted while searching for optimal parameters. By default, the fit starts from a fixed value and no random points are attempted. Increase the random restarts to search for other locations.

Use derivatives: By default, the optimization uses a derivative-based approach to find optimal parameters and switches to a derivative-free approach only when no solutions can be found. Using derivatives is usually much faster but unchecking this setting may improve the fit in some cases.

Mathematical Detail

A Gaussian process is a stochastic (randomly determined) process consisting of random variables which have a joint Gaussian distribution [RW06].

Covariance

A Gaussian process model assumes that the response, \(y\), is a function of the numeric factor settings, \(\mathbf{x}\), so that \(y = f(\mathbf{x})\), and that the covariance between any two response values depends only on their factor settings,

\[\mathrm{cov}\left[y_i(\mathbf{x}_i),y_j(\mathbf{x}_j)\right] = \Sigma(\mathbf{x}_i, \mathbf{x}_j)\]

Kernel Function

The function \(\Sigma\) is called a kernel function and Stat-Ease software assumes a particular kernel function involving a Gaussian (squared exponential) plus some constant noise,

\[\Sigma(\mathbf{x}_i, \mathbf{x}_j) = \sigma_0^2 \left[\exp\left(-\frac{1}{2 \ell^2} \|\mathbf{x}_i - \mathbf{x}_j\|^2\right) + g^2 \delta_{i,j}\right] = \sigma_0^2 K(\mathbf{x}_i, \mathbf{x}_j)\]

where \(K\) is the unscaled kernel function,

\[K = K(\mathbf{x}_i, \mathbf{x}_j) = \exp\left(-\frac{1}{2 \ell^2} \|\mathbf{x}_i - \mathbf{x}_j\|^2\right) + g^2 \delta_{i,j}\]

Here, \(\|\mathbf{x}_i - \mathbf{x}_j\|\) is the Euclidean distance between any two runs \(\mathbf{x}_i\) and \(\mathbf{x}_j\) in coded units.

The kernel includes tunable parameters (so-called hyper-parameters), \(\sigma_0\), \(\ell\), and \(g\). See the Gaussian Process Statistics section for more information on the parameters.

This kernel function defines a matrix for any two set of factor settings, matrices \(\mathbf{X}_1\) and \(\mathbf{X}_2\), provided they have same number of columns:

\[\mathbf{\Sigma}_{i,j} = \Sigma(\mathbf{X}_i, \mathbf{X}_j)\]

Predictions

With this in mind, for a factor data matrix \(\mathbf{X}\) and response column vector \(\mathbf{y}\) of length \(n\) (the number of runs), the predictions over new locations in the factor space, \(\mathbf{x}\), is,

\[\mathbf{\hat{y}}(\mathbf{x}) = \mathbf{\Sigma}(\mathbf{X},\mathbf{x})^T \cdot \mathbf{\Sigma}(\mathbf{X},\mathbf{X})^{-1} \cdot \mathbf{y}\]

Variance

The estimated variance at that point is,

\[\hat{\sigma}^2_y(\mathbf{x}) = \mathbf{\Sigma}(\mathbf{x},\mathbf{x}) - \mathbf{\Sigma}(\mathbf{X},\mathbf{x})^T \cdot \mathbf{\Sigma}^{-1}(\mathbf{X},\mathbf{X}) \cdot \mathbf{\Sigma}(\mathbf{X},\mathbf{x})\]

The software reports intervals at \(\hat{y} \pm 2 \hat{\sigma}_y\).

Note

In the equations for \(\hat{y}\) and \(\hat{\sigma}_y^2\), the matrix \(\mathbf{X}\) refers to what is sometimes called the Gaussian process training data set (your factor data) and \(\mathbf{x}\) refers to a single point in what is sometimes called the Gaussian process test data set (some arbitrary point at which you want to predict).

Overall Scale

The overall scale, \(\sigma_0\) is calculated from the smoothing parameter, \(\ell\), the noise parameter, \(g\), the response vector, \(\mathbf{y}\), and the inverse of the unscaled kernel matrix, \(\mathbf{K}\) (which depends on \(\ell\) and \(g\)). For the special case of Maximum Likelihood selection,

\[\sigma_0^2 = \frac{1}{n} \mathbf{y}^T \mathbf{K}^{-1} \mathbf{y}\]

Overall Noise

The overall noise, \(\sigma_n\), is the product of the noise parameter, \(g\), and the overall scale, \(\sigma_0\),

\[\sigma_n = g \sigma_0\]

\(\sigma_n\) gives the overall error in response units for the \(n\) design points.

Log Likelihood

The log marginal likelihood for the Gaussian process model is,

\[L = \ln{p(\mathbf{y}|\mathbf{X})} = -\frac{1}{2} \mathbf{y}^T \mathbf{\Sigma}^{-1} \mathbf{y} - \frac{1}{2} \ln|\mathbf{\Sigma}| - \frac{n}{2} \ln{2 \pi}\]

for likelihood, \(p\), a function of the response vector, the factor settings and the fit parameters.

Cross Validation Log Likelihood

The log likelihood for leave-one-out (LOO) cross validation is,

\[L_{LOO} = \sum_{i=1}^n \ln{p(y_i|\mathbf{X},\mathbf{y}_{-i})}\]

where \(\left(\mathbf{X}_{-i},\mathbf{y}_{-i}\right)\) is factor and response data with the \(i\text{th}\) row removed [RW06].

R Squared

The R-Squared for the fit is,

\[R^2 = 1 - \sum_{i=1}^n \left(y_i-\hat{y}_i\right)^2 / \sum_{i=1}^n \left(y_i-\bar{y}\right)^2\]

where \(\hat{y}_i\) is the prediction at the \(i\text{th}\) design point. This will be the same as \(y_i\) for zero-error Gaussian process models, unless there are numerical problems or noisy replicates. Finally, \(\bar{y}\) is mean response for the \(n\) runs,

\[\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i\]

References

[RW06] (1,2,3)

Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

[SWN03]

Thomas Santner, Brian Williams, and William Notz. The Design and Analysis Computer Experiments. Springer, 01 2003. ISBN 978-1-4419-2992-1. doi:10.1007/978-1-4757-3799-8.