L-BFGS Minimiser

Next: A Single-ended Method: Eigenvector-following Up: Optimisation of the Nudged Previous: Quenched Velocity Verlet Minimiser Contents

L-BFGS Minimiser

The BFGS algorithm belongs to the class of variable metric methods and is slightly different from the Davidon-Fletcher-Powell (DFP) method in the way in which the correction term is constructed [132] (see below). It has become generally recognised that the BFGS algorithm is superior to DFP in convergence tolerances, roundoff error and other empirical issues [31]. The idea of the variable metric, or quasi-Newton, method is to build up a good approximation to the inverse Hessian matrix $\hess ^{-1}$ by constructing a sequence of matrices $\{{\bf A}_1,{\bf A}_2,{\bf A}_3,\dots\}$ with the following property:

$\displaystyle \lim_{i \rightarrow \infty} {\bf A}_i = \hess ^{-1}.$

(4.17)

If we are successful in doing so, combining two Newton-Raphson equations (see Equation 2.13) for consecutive iterations and , we find that ${\bf A}_{i+1}$ should satisfy

$\displaystyle {\bf x}_{i+1}-{\bf x}_{i} = \varpi_i \hat{\bf p}_i = - {\bf A}_{i+1} \left( {\bf g}_{i+1} - {\bf g}_{i}\right),$

(4.18)

where we have used the property ${\bf A}_{i+1}={\bf A}_i$ , which must hold if both ${\bf x}_i$ and ${\bf x}_{i+1}$ are in the neighbourhood of minimum ${\bf x}^*$ .

At every iteration the new approximation to the inverse Hessian ${\bf A}_{i+1}$ should be constructed in order to calculate the next step. The update formula must be consistent with Equation 2.18 and could take the form

$\displaystyle {\bf A}_{i+1} = {\bf A}_{i} + \tilde{\bf A},$

(4.19)

where the correction term $\tilde{\bf A}$ is constructed using the gradient difference $\delta {\bf g} = {\bf g}_{i+1} - {\bf g}_{i}$ , the step $\delta {\bf x} = {\bf x}_{i+1} - {\bf x}_{i}$ , and the inverse Hessian matrix from the previous iteration ${\bf A}_i$ . The BFGS update correction is [31]

$\displaystyle \tilde{\bf A} = \frac{\dx \otimes \dx }{(\dx ,\dg )} - \frac{{\bf... ...left( \frac{\dx }{(\dx ,\dg )} - \frac{{\bf u}}{(\dg ,{\bf u})}\right) \right],$

(4.20)

where $\otimes$ denotes the direct product of two vectors, and ${\bf u} = \hess _i \dg$ .

Since at every iteration the old Hessian is overwritten with a new one storage locations are needed. This is an entirely trivial disadvantage over the conjugate gradient methods for any modest value of [31]. However, for large-scale problems it is advantageous to be able to specify the amount of storage BFGS is allowed to use.

There is a version of the BFGS algorithm that allows us to vary the amount of storage available to BFGS and is particularly suitable for large-scale problems [2]. The difference between this limited memory BFGS (L-BFGS) algorithm and the standard BFGS is in the Hessian matrix update. L-BFGS stores each correction separately and the Hessian matrix is never formed explicitly. To store each correction to the Hessian storage locations are needed [1]. The user can specify the number of corrections L-BFGS is allowed to store. Every iteration ${\bf A}{\bf g}$ is computed according to a recursive formula described by Nocedal [1]. For the first iterations L-BFGS is identical to the BFGS method. After the first iterations the oldest correction is discarded. Thus, by varying the iteration cost can also be controlled, which makes the method very efficient and flexible. In general, because only recent corrections are stored L-BFGS is better able to use additional storage to accelerate convergence. Here we employed a modified version of Nocedal's L-BFGS implementation [3] in which the line search was removed and the maximum step size was limited for each image separately.

Next: A Single-ended Method: Eigenvector-following Up: Optimisation of the Nudged Previous: Quenched Velocity Verlet Minimiser Contents

Semen A Trygubenko 2006-04-10