next up previous contents
Next: A Single-ended Method: Eigenvector-following Up: Optimisation of the Nudged Previous: Quenched Velocity Verlet Minimiser   Contents

L-BFGS Minimiser

The BFGS algorithm belongs to the class of variable metric methods and is slightly different from the Davidon-Fletcher-Powell (DFP) method in the way in which the correction term is constructed [132] (see below). It has become generally recognised that the BFGS algorithm is superior to DFP in convergence tolerances, roundoff error and other empirical issues [31]. The idea of the variable metric, or quasi-Newton, method is to build up a good approximation to the inverse Hessian matrix $ \hess ^{-1}$ by constructing a sequence of matrices $ \{{\bf A}_1,{\bf A}_2,{\bf A}_3,\dots\}$ with the following property:

$\displaystyle \lim_{i \rightarrow \infty} {\bf A}_i = \hess ^{-1}.$ (4.17)

If we are successful in doing so, combining two Newton-Raphson equations (see Equation 2.13) for consecutive iterations $ i$ and $ i+1$, we find that $ {\bf A}_{i+1}$ should satisfy

$\displaystyle {\bf x}_{i+1}-{\bf x}_{i} = \varpi_i \hat{\bf p}_i = - {\bf A}_{i+1} \left( {\bf g}_{i+1} - {\bf g}_{i}\right),$ (4.18)

where we have used the property $ {\bf A}_{i+1}={\bf A}_i$, which must hold if both $ {\bf x}_i$ and $ {\bf x}_{i+1}$ are in the neighbourhood of minimum $ {\bf x}^*$.

At every iteration $ i$ the new approximation to the inverse Hessian $ {\bf A}_{i+1}$ should be constructed in order to calculate the next step. The update formula must be consistent with Equation 2.18 and could take the form

$\displaystyle {\bf A}_{i+1} = {\bf A}_{i} + \tilde{\bf A},$ (4.19)

where the correction term $ \tilde{\bf A}$ is constructed using the gradient difference $ \delta {\bf g} = {\bf g}_{i+1} - {\bf g}_{i}$, the step $ \delta {\bf x} = {\bf x}_{i+1} - {\bf x}_{i}$, and the inverse Hessian matrix from the previous iteration $ {\bf A}_i$. The BFGS update correction is [31]

$\displaystyle \tilde{\bf A} = \frac{\dx \otimes \dx }{(\dx ,\dg )} - \frac{{\bf...
...left( \frac{\dx }{(\dx ,\dg )} - \frac{{\bf u}}{(\dg ,{\bf u})}\right) \right],$ (4.20)

where $ \otimes$ denotes the direct product of two vectors, and $ {\bf u} = \hess _i \dg $.

Since at every iteration the old Hessian is overwritten with a new one $ n^2/2 + n/2$ storage locations are needed. This is an entirely trivial disadvantage over the conjugate gradient methods for any modest value of $ n$ [31]. However, for large-scale problems it is advantageous to be able to specify the amount of storage BFGS is allowed to use.

There is a version of the BFGS algorithm that allows us to vary the amount of storage available to BFGS and is particularly suitable for large-scale problems [2]. The difference between this limited memory BFGS (L-BFGS) algorithm and the standard BFGS is in the Hessian matrix update. L-BFGS stores each correction separately and the Hessian matrix is never formed explicitly. To store each correction to the Hessian $ 2n$ storage locations are needed [1]. The user can specify the number of corrections L-BFGS is allowed to store. Every iteration $ {\bf A}{\bf g}$ is computed according to a recursive formula described by Nocedal [1]. For the first $ m$ iterations L-BFGS is identical to the BFGS method. After the first $ m$ iterations the oldest correction is discarded. Thus, by varying $ m$ the iteration cost can also be controlled, which makes the method very efficient and flexible. In general, because only $ m$ recent corrections are stored L-BFGS is better able to use additional storage to accelerate convergence. Here we employed a modified version of Nocedal's L-BFGS implementation [3] in which the line search was removed and the maximum step size was limited for each image separately.

next up previous contents
Next: A Single-ended Method: Eigenvector-following Up: Optimisation of the Nudged Previous: Quenched Velocity Verlet Minimiser   Contents
Semen A Trygubenko 2006-04-10