Umbrella sampling and weighted histogram analysis (WHAM) in GROMACS with PLUMED2
In this tutorial we will use GROMACS patched with PLUMED2 to compute the potential of mean force (PMF) using umbrella sampling and weighted histogram analysis (WHAM). As an example system we will compute the PMF between two methane molecules in water. This tutorial was created using GROMACS-2019.4 and PLUMED2.7.0. The necessary files to run this tutorial can be found here.
In umbrella sampling, multiple simulations are performed in which the potential energy of the system is biased with a harmonic potential of the form
\[\begin{equation} \label{EQ:1} V_i(\xi)=\frac{1}{2}K(\xi−\xi_i)^2 \end{equation}\]where \(\xi\) is the reaction coordinate. Different umbrella windows are employed with the potential centered on successive values of \(\xi_i\). The region sampled in the presence of the biasing potential is called a window. Configurations within each biased simulation \(i\) are sampled according to the distribution
\[\begin{equation} P_i'(R) \propto P(R)e^{-\frac{K(\xi−\xi_i)^2}{2 k_BT}} \end{equation}\]where R are the atomic coordinates, \(k_B\) is the Boltzmann constant, \(T\) is the temperature, and \(P(R)\) is the Boltzmann distribution
\[\begin{equation} P(R) \propto e^{-\frac{U(R)}{k_BT}} \end{equation}\]The sampling in each umbrella window is confined to a narrow region around \(\xi_i\) due to the harmonic constraint, forcing the system to explore only a predefined region of the reaction coordinate. By combining simulations performed with different values of \(\xi_i\), we can compute the continuous potential of mean force along the reaction coordinate.
The weighted histogram analysis method (WHAM) provides a scheme for obtaining the optimal estimate of the unbiased probability distribution from the biased probability distributions due to the biasing potentials. The WHAM equations are
\[\begin{equation} \label{EQ:WHAM1} P(\xi) = \frac{\sum_{i=1}^{N_w} n_i P_i'(\xi)}{\sum_{j=1}^{N_w} n_j e^{-\frac{[V_j(\xi)-F_j]}{k_BT}}} \end{equation}\] \[\begin{equation} \label{EQ:WHAM2} e^{-F_j/k_BT} = \int e^{-V_j(\xi)/k_BT} P(\xi) d\xi \end{equation}\]where \(N_w\) is the number of sampling windows, and \(n_i\) is the number of configuratoins for a biased simulation. Equations \(\ref{EQ:WHAM1}\) and \(\ref{EQ:WHAM2}\) must be solved self-consistently since the \(F_j\)s are unknown.
In this section we will demonstrate the GROMACS implementation of WHAM. This part of the tutorial follows closely that from the Svedružić lab. First we need to set up our system. You will need a pdb file for a single methane molecule, here called methane.pdb. We will use the OPLS force field parameters for methane and the TIP4PEW water model. First you need to copy the methane.rtp file to the oplsaa.ff directory (usually in /usr/local/gromacs/share/gromacs/top/). Now we can use pdb2gmx to create a GROMACS .gro and .top file:
When prompted select the OPLS-AA/L all-atom force field and the TIP4PEW TIP 4-point with Ewald water model. Running the pdb2gmx command will create a topology file topol.top and a conf.gro structure file.
Now we will insert two methane molecules randomly in a 0.5 nm box as follows:
This ensures the inserted molecules are not too far apart in our initial configuration. Now we will make the box bigger to the desired box size (for this tutorial we need a box size of at least 3.2 nm):
Open the topol.top file in a text editor of your choice and update the last line to the correct number of methanes from 1 to 2.
Then add water to the box:
Finally, to perform umbrella sampling in GROMACS we will need to create an index file using make_ndx to make two groups that will be used in specifying the restraining potential. Here we will create a group containing just one carbon from each of the two methanes and name them CA and CB respectively.
We are now ready to perform the umbrella sampling simulations. We will perform 27 simulation windows with a harmonic restraint center at 0.05 nm to 1.3 nm. The are different ways in which we could generate our initial configurations for each window. In this tutorial, we will perform a two-stage energy minimization of each window. In the first stage we minimize the energy of the system with a maximum of 1,000 steps. We have included the flag define = -DFLEXIBLE, which signals to use flexible water, since by default all water models are rigid using an algorithm called SETTLE. In the second minimization we remove the define = -DFLEXIBLE and increasing the maximum number of steps to 50,000 to get the molecules in a good starting position. Then, we will equilibrate each window for 100 ps in the NVT ensemble followed by a 1 ns equilibration in the NPT ensemble.
Since the umbrella potential is applied during the equilibrations, the methanes should get to the correct distances during the equilibration stage due to the bias potentials. Lastly, we will run a 5 ns production run.
All the .mdp files are provided in the directory mdp-pull. The important thing to notice in the .mdp files is the introduction of the COM pulling feature of GROMACS. This is set with the lines
The script run-windows-umbrella-gmx.sh will replace the WINDOW keyword in the .mdp files and launch all the simulations for the separate windows:
In the above script we are simulating 27 windows from 0.05 nm to 1.3 nm in distance. Note that we will need the pull force and pull distance for each step, and this is specified with the -pf and -px flags. Notice we also need the index file specified with the -n flag in order to get the groups for the pull parameters.
From the directory containing your conf.gro, topol.top, and index.ndx files and the mdp-pull directory, run the command:
This will replace the WINDOW keyword with the appropriate values and launch all the simulations. After all the simulations finish (this will take a while), we can use the gmx wham feature in GROMACS to get the PMF. This program requires a list of the .tpr files and another file containing a list of the .xvg files containing the force as arguments. In the terminal type:
And then you can run gmx wham:
After running gmx wham the outputting file profile.xvg contains the PMF. Because our coordinate was the interparticle distance in Cartesian coordinates:
\[\begin{equation} \xi=\sqrt{(\textbf{r}_1-\textbf{r}_2)^2} \end{equation}\]the transformation to spherical polar coordinates with \(r\), \(\theta\) , \(\phi\) of particle 2 with respect to particle 1 results in a term \(2 k_BT ln (r)\) in the potential of mean force:
\[\begin{equation} W(r)=-k_BT \ln{(P(r))} + 2 k_BT \ln{(r)} + constant \end{equation}\]This is due to the increasing volume of configuration space as the distance between particles increases leading to this extra entropic term. See for example, Trzesniak, et al. Chem Phys Chem 2006
For the radial PMF, a correction of 2kTln(r) needs to be added to the data in profile.xvg. Additionally the constant should be chosen to shift the data such that its tail goes to zero. In this case, adding a constant of 69 works, but yours may be different. To plot this in gnuplot do the following in a gnuplot terminal:
Here is an example of the PMF between methane molecules:
Notice that the shifted PMF goes to zero at large distances and at short distances is repulsive. At intermediate distances the potential oscillates between attractive and repulsive components corresponding to solvation shells in the liquid structure factor.
To see this, we can recall that the radial distribution function is related to the PMF according to:
\[\begin{equation} g(r) = e^{-\beta w(r)} \end{equation}\]Using the data from the PMF, here is a plot of the radial distribution function:
The other output from the gmx wham program is histo.xvg which shows the histogram of the sampled distances in each window. This is helpful in determining if there is enough overlap between windows. Here is a plot of each histogram for this simulation:
Clearly all windows are overlapping sufficiently. If there were gaps in the sampling, we could choose a smaller window size or pick specific spots that were missing to simulate.
In this second part of the tutorial we will repeat the above example using PLUMED2 to define the reaction coordinate and perform the biasing. Here we write a plumed.dat file to define the reaction coordinate and specify the bias potential. In our plumed.dat file we have the following lines:
Here we are defining the reaction coordinate to be the distance between the methane carbons and we are using a harmonic restraint. Like before, the keyword WINDOW will be replaced in our bash script for each window. A new set of .mdp files without the COM pulling commands are available here. These are identical with the .mdp files used above, but without the COM pulling commands since we are now performing the biasing using PLUMED2.
As in the previous example, we will use a bash script to launch all the simulations. The script run-windows-umbrella-plumed.sh will replace the WINDOW keyword in the plumed.dat file and launch all the simulations for the separate windows:
In the above script we are again simulating 27 windows from 0.05 nm to 1.3 nm in distance. The bias and coordinates from the production runs are written to the output COLVAR-prd.0, …, COLVAR-prd.27 files. The final line writes a metadata.dat file that will be used by the WHAM code.
A useful stand alone code that can perform WHAM for us is available here from the Grossfield Lab. Read the documentation, and after downloading and installing the wham code, make sure you have the executable wham in your PATH.
We will need the metadata.dat file containing the information about the coordinate files and the bias parameters for each window. This file should contain a column with the name of the COLVAR file, followed by a column with the location of the bias minimum for that simulation, and a column with the value of the spring constant, as follows:
We execute the wham code in the terminal with:
Here the first two arguments specify the boundaries of the histogram, and the third argument is the number of bins to use in the histogram. The next argument is the tolerance followed by the temperature in Kelvin (298.15 K in this case). The next value is the padding value for periodic free energy curves which is set to 0 for aperiodic reaction coordinates. Finally we specify the name of the metadata file and the output free energy file.
Again we can plot the output free energy as before, shifting by a constant such that the PMF goes to zero at large displacement:
Below is the PMF computed using the WHAM code as compared to the one we obtained above within GROMACS: