CurveExpert Professional 2.7.3 documentation
Models which consist of a linear combination of a particular set of functions Xk are called linear models, and linear regression can be used to minimize the difference between the model and data. The general form of this kind of model is
where Xk(x) are fixed functions of x that are called the basis functions, and ak are the free parameters. Note that “linear” refers only to dependence of the model on the parameters ak; the functions Xk(x) may be nonlinear. Minimization of the above linear model is performed with respect to the merit function
The minimum of the above equation occurs where the derivative of S with respect to the parameters disappears. Substituting the linear model into this function, taking the first derivative, and setting this equal to zero gives the normal equations that can be solved directly for the parameters ak.
This program uses the Levenberg-Marquardt method to solve nonlinear regressions. This method combines the steepest-descent method and a Taylor series based method to obtain a fast, reliable technique for nonlinear optimization. This statement will be repeated later, but can bear stating at the beginning of this discussion: neither of the above optimization methods are ideal all of the time; the steepest descent method works best far away from the minimum, and the Taylor series method works best close to the minimum. The Levenberg-Marquardt (LM) algorithm allows for a smooth transition between these two methods as the iteration proceeds.
In general, the data modeling equation (with any number of independent variables) can be written as follows:
The above expression simply states that the dependent variable y
can be expressed as a function of the independent variables  and
vector of parameters 
 of arbitrary length. Note that using the ML
method, any nonlinear equation with an arbitrary number of
parameters can be used as the data modeling equation. Then, the
merit function we are trying to minimize is
where N is the number of data points,  denotes the x data points,
 denotes the y data points, 
 is the standard deviation
(uncertainty) at point i, and 
 is an arbitrary nonlinear
model evaluated at the ith data point. This merit function simply
measures the agreement between the data points and the parametric
model; a smaller value for the merit function denotes better
agreement. Commonly, this merit function is called the chi-square.
Now, we will take a step back to provide a framework of optimization methods. From the area of pure optimization, which will not be explained here, two basic ways of finding a function minimum are a Taylor series based method and the steepest-descent method. The Taylor series method states that sufficiently close to the minimum, the function can be approximated as a quadratic. Without detailed explanation of that method, a step from the current parameters a to the best parameters amin can be written as
where H is the Hessian matrix (a matrix of second derivatives). If the approximation of the function as a quadratic is a poor one, then we might instead use the steepest-descent method, where a step to the best parameters from the current parameters is
This equation simply states that the next guess for the parameters is a step down the gradient of the merit function. The constant c is forced to be small enough that a small step is taken and the gradient is accurate in the region that the step is taken.
Since we know the chi-square function, we can directly differentiate to obtain the gradient vector and the Hessian matrix. Taking the partial derivatives of the merit function with respect to a gives
To obtain the Hessian matrix, take the gradient of the gradient above (so that we have a matrix of partial second derivatives):
Now, for convenience, define the gradient vector and the curvature matrix as
[Note that the second derivative term in C will be ignored because
of two reasons: it tends to be small because it is multiplied by
(), and it tends to destabilize the algorithm for badly fitting
models or data sets contaminated with outliers. This action in no
way affects the minimum found by the algorithm; it only affects the
route in getting there.] So, the Taylor series method (inverse
Hessian method) can be written as the following set of linear
equations:
where NP is the number of parameters in the model that is being optimized. This linear matrix will be our workhorse for this method after some modification; it can be solved for the increments da that, when added to the current approximation for the parameters, gives the next approximation. Likewise, we can substitute our “convenient” definitions into the steepest descent formula to obtain
Neither of the aforementioned optimization methods are ideal all of the time; the steepest descent method works best far away from the minimum, and the Taylor series method works best close to the minimum. The Levenberg-Marquardt (LM) algorithm allows for a smooth transition between these two methods as the iteration proceeds.
The first issue in deriving the LM method is to attach some sort of
scale to the constant c in the steepest-gradient method (equation
2). Typically, there is no obvious way to determine this number,
even within an order of magnitude. However, in this case, we have
access to the Hessian matrix; examining its members, we see that
the scale on this constant must be . But, that still may be
too large, so let’s divide that scale by a nondimensional factor
(l) and plan on setting this much larger than one so that the step
will be reduced (for safety and stability).
The second issue to formulate the LM method is noting that the steepest-descent and Taylor series methods may be combined if we define a new matrix Mij by the following:
This matrix combines equations [1] and [2] into a convenient and compact form. So finally, we have a means of calculating the step da in the parameters by the following system of linear equations:
Note that when l is large, the matrix M is forced to be diagonally dominant; consequently, the above equation is equivalent to the steepest descent method (equation 2). Conversely, when the parameter l goes to zero, the above equation is equivalent to the Taylor series method (equation 1). So, we vary l to switch between the two methods, continually calculating a parameter correction da that we apply to our most recent guess for the parameter vector.
The steps that are taken in the LM algorithm are as follows:
Compute 
Pick a conservative value for  (0.001 in CurveExpert)
Solve the linear equations (equation 3) for 
Evaluate 
If , increase 
 by a factor (10 in CurveExpert) and go back to step [3].
If , decrease 
 by a factor (10 in CurveExpert), correct the parameter vector by 
, and go back to step [3].
Iteration is stopped when . This
tolerance 
 is the value specified by the user.