Next: Bibliography Up: thesis Previous: TOTAL ESCAPE PROBABILITY FOR Contents

FINDING THE SHORTEST AND THE FASTEST PATHS

$\begin{epigraphs} \qitem {There are many paths to the top of the mountain, \\ but the view is always the same.} {Chinese proverb} \end{epigraphs}$
Techniques for finding the shortest and the fastest pathways can be useful during sampling for pathways, in post-processing and analysis of DPS databases, and in various optimisations of pathway ensembles [290,291,192,10,8]. Given a digraph representation of a connected database of minima and transition states it is possible to identify the shortest path between any two minima using a breadth-first search algorithm [154], which runs in linear time. The shortest path, however, is not necessarily the fastest. Recently Evans suggested using Dijkstra's algorithm [153] for finding the path with the largest non-recrossing DPS rate [290]. It was pointed out by Carr, however, that the weight function used in Reference Evans03 is not positively defined, and the use of the Bellman-Ford-Moore (BFM) algorithm [292,294,293] was suggested instead [291]. Because Dijkstra's algorithm scales better than BFM algorithm (see Figure E.1) we describe below a weight function that enables us to use the Dijkstra's algorithm for finding the fastest path.

As detailed in Section 4.1.1, within a digraph representation of a coarse-grained PES each minimum is represented by a single node and each transition state is represented by two oppositely directed edges. Here we ignore degenerate rearrangements since they do not affect the rates, and for cases where more than one transition state links the same pair of minima only the one with the fastest forward and backward rates is used. Adopting the non-recrossing DPS rate definition from Equation 4.11 we can associate forward and backward rate constants

$\begin{displaymath}\begin{array}{lll} k_{a,b} &=& \displaystyle\frac{P_b^{\rm eq... ...mma_2,i_2}\cdots \dsum _{\gamma_n}k_{\gamma_n,i_n}} \end{array}\end{displaymath}$

(E.1)

with a pathway $\xi=a,i_1,i_2,\dots,i_n,b$ .

For a detailed description of Dijkstra and BFM algorithms we refer the reader to Reference CormenLRS01. A suitable choice of edge weight function to be used with either of the shortest path algorithms must be made. We define the weight of each directed edge $\alpha \rightarrow \beta$ as

$\displaystyle w(\beta,\alpha) = \ln\left(\dfrac{\sum_{\gamma}k_{\gamma,\alpha}}{k_{\beta,\alpha}}\right).$

(E.2)

Note that $w(\beta,\alpha) \ne w(\alpha,\beta)$ and $0 \leqslant w < \infty$ . Since the weight is non-negative it is possible to apply Dijkstra's algorithm to find the fastest pathway [154].

It makes sense to include only minima with more than one connection when searching for the fastest pathway. In our calculations we first iteratively identified a connected subset of minima with the number of connections greater than one, and then run the above algorithm for this subset.

To find the fastest path connecting minima and in the direction $b \rightarrow a$ it is necessary to solve either a single-source shortest paths problem with minimum chosen as a source, or a single destination shortest paths problem with minimum chosen as the destination. In the latter case all the edges in the graph need to be reversed, which can be accomplished by simply swapping the arguments to the weight function in Equation E.2. This reversal is possible because our graph is a symmetric digraph, i.e. if there is an edge $a \rightarrow b$ then there is also an edge $b \rightarrow a$ .

A useful check for correct implementation of this algorithm is to verify that the weight of the shortest path from minimum to minimum , , is related to the rate $k_{a,b}$ calculated for that pathway by

$\displaystyle k_{a,b} = \dfrac{P_b^{\rm eq}}{P_B^{\rm eq}} \exp \Bigl( -W\left(a,b\right) \Bigr) \sum_{\gamma}k_{\gamma,b}.$

(E.3)

The BFM algorithm runs in $\mO (\vert V\vert\,\vert E\vert)$ time while Dijkstra's algorithm as described in Reference dijkstra59 scales as $\mO (\vert V\vert^2+\vert E\vert)=\mO (\vert V\vert^2)$ [154]. Although both algorithms appear to have similar (quadratic) asymptotic complexity for the case of sparse graphs (e.g. $\vert E\vert=\mO (\vert V\vert)$ ) the scaling prefactor is smaller for Dijkstra's algorithm. In addition, for sparse graphs with $\vert E\vert=o(\vert V\vert^2/\lg\vert V\vert)$ an implementation of Dijkstra's algorithm that utilises a priority queue of some sort can improve the asymptotic bounds further. Priority queues based on binary min-heap and Fibonacci heap [276] result in $\mO ((\vert E\vert+\vert V\vert)\lg\vert V\vert)=\mO (\vert E\vert\lg\vert V\vert)$ and $\mO (\vert V\vert\lg\vert V\vert+\vert E\vert)$ scaling of the algorithm, respectively [154]. In the present work we have used a priority queue based on Fibonacci heap. A performance comparison of our implementations of BFM and static Dijkstra's algorithms is shown in Figure E.1.

**Figure:** Performance comparison of BFM and static Dijkstra algorithms for an evolving database of minima and transition states for tryptophan zipper . CPU time as a function of the number of minima in the database, , is shown. The number of transition states, , scales linearly with , and for the plotted data set the dependence can be approximated as with a correlation coefficient of . The units of time are arbitrary.
$\begin{psfrags} \psfrag{ 0} [bc][bc]{$0$} \psfrag{ 0.1} [bc][bc]{$0.1$} \p... ...includegraphics[width=.50\textheight]{fastest_path/BFMvsDij.eps}} \end{psfrags}$

Since knowledge of all the pathways on a complicated PES is out of the question a representative sample must be obtained to properly describe the property of interest. It is unclear what is the best strategy of evolving the pathway database if the proper description of kinetics is sought, as sampling techniques may introduce a bias towards pathways of particular type.

In previous DPS studies a set of perturbations was applied to the fastest known path to generate new ones [9,10,8]. New pathways were constructed from a subset of stationary points featured in the pathway being perturbed, and from these that were found as a result of perturbation. Because building the shortest paths tree with Dijkstra's algorithm is relatively cheap it is possible to analyse the whole database for the presence of new faster pathways rather than just the subset. Results for an example application are presented in Figure E.2.

**Figure:** The rate of the fastest pathway, **$k_{a,b}$** , as a function of the iteration number for the evolving database of tryptophan zipper -I. Note **$\log_{10}$** scale on vertical axis. As the generation of new connections can affect the rates of the pathways explicitly recorded in the pathway database (see Equation E.1), these rates were recomputed at the end of the calculation. This result, and the fact that **$k_{a,b}$** includes recrossing contributions, explains the wiggles in the data. The recrossing contribution was calculated using Algorithm B.1, which is discussed in Chapter 4.
$\begin{psfrags} \psfrag{kABr} [bc][bc]{$\log_{10} k_{a,b}$} \psfrag{Pathway ... ...cludegraphics[width=.50\textheight]{fastest_path/RateABrofn.eps}} \end{psfrags}$

After a set of perturbations was applied to the fastest currently known pathway $a\leftarrow b$ Dijkstra's algorithm was used to solve the single-source shortest paths problem to build the shortest paths tree rooted at minimum . For path $\xi$ the set of perturbations constituted $l(\xi)-\Delta + 1$ attempts to shortcut the path, where $\Delta$ is a shortcut parameter, $\Delta\in\{1,2,\dots,l(\xi)\}$ .

Each attempt to shortcut the path was realised as a single connect run (see Chapter 2) seeded with two minima $\alpha,\beta\in\xi$ separated by a gap of length $\Delta$ along the path. If the fastest path $a\leftarrow b$ found by Dijkstra's algorithm was different (not necessarily new) from the one that was currently being perturbed the iteration was completed, this pathway was perturbed in the next iteration and parameter $\Delta$ was reset to . Otherwise, $\Delta$ was incremented by one, unless $\Delta=l(\xi)$ in which case the calculation was terminated.

Next: Bibliography Up: thesis Previous: TOTAL ESCAPE PROBABILITY FOR Contents

Semen A Trygubenko 2006-04-10