Molecular Dynamics Hands-on Session I

Tutorial on MD simulations using Gromacs 3.3.3

Tsjerk A. Wassenaar

Outline

Molecular dynamics (MD) simulations consist of three stages: First, the input data (the simulation system) has to be prepared. Then the production simulation can be run and finally the results have to be analysed and be put in context. Although the second stage is the most computationally demanding step, with simulations commonly running for several months, the most laborious steps are the preparation and the analysis. This tutorial will provide an example session of setting up a protein for molecular dynamics simulation, running it and analysing the results. The protein used for this session is Bacteriophage T4 Lysozyme (PDB id 1LW9), which is a crystal structure resolved at a resolution of 1.45 Ångström.

This tutorial uses Gromacs (http://www.gromacs.org) version 3.3.3 and most of the commands given are specific to this package. The workflow(s) will be more general though and will also apply to other MD packages. Each step will be presented schematically, as a node with input and output. A step can be the execution of a single program or a combination of programs, with output from one being directed to the other. In the case of a program, the input/output types and flags will be specified. For most of the programs/steps only the input/output options used are listed. Most of the programs offer options for more advanced control, which will not be mentioned here. For those interested, the help page for each program gives a full listing and description of the functionality. The help page can be shown by running a program with the option "-h". In the workflow, programs are indicated by the name on a yellow background. Input/output data is given as a descriptive name on a green background. The blue blocks indicate input data which is essentially not dependent on the structure under study. The orange blocks are processes comprising of several steps in terms of the programs involved. In the present scheme, orange blocks are solely used for the simulation steps, which are explained in more detail below. The lines connect the data to the processes and for each program the command line option for the input or the output is indicated in white on black.

Preparation

The preparation of the simulation system is the most important stage of MD. MD simulations are performed to gain insight into processes on an atomistic scale. They can be used to understand experimental observations, to test hypotheses or as a basis for new hypotheses to be assessed experimentally. However, for each of these cases, the simulations have to be designed such that they are suited for the purpose. This means that the setup should be performed with care.

Selection, validation and editing of the structure

Missing residues, missing atoms

In this tutorial we consider the simulation of a protein. The first step in the sequence then is the selection of a starting structure. This structure has to be checked for missing residues and atoms, and if atoms or residues are missing, these have to be modeled in some way. In the case of a structure from the PDB repository, missing residues and atoms are listed in the file. Download or copy the file 1LW9.pdb and have a look at it. Search for the presence of REMARK entries with numbers 465 and/or 470.

Non-standard residues and ligands

As this session does not cover modeling of a structure, it should not be surprising that there are no residues or side chains missing. However, there is a minor issue that needs to be resolved. The crystal structure was obtained in the presence of potassium, chloride and 2-hydroxethyl disulfide. These are not part of the protein as we want to study it, and, on another note, the disulfide is not in the force field we will be using. For these reasons these molecules have to be removed from the file. Using a text editor, remove the lines refering to potassium (K), chloride (CL) and 2-hydroxethyl disulfide (HED). Alternatively, issue the commands:

mv 1LW9.pdb 1LW9.pdb.org
egrep -v "^HETATM.{11}(HED| CL| K)" 1LW9.pdb.org > 1LW9.pdb

This will look for the pattern specified in each line in the file and retain only those lines which do not match.

Structure quality

It is good practice to run further checks on the structure to assess the quality. For instance, the refinement process in crystal structure determination may not always yield proper orientations of glutamine and asparagine amide groups. Also, the protonation state and side chain orientation of histidine residues may be problematic. To perform a proper validation of the structure, several programs and servers exist (e.g. WHATIF). For this tutorial we will assume that the structure is good enough for our purpose and continue with preparing the simulation system.

Structure conversion and topology

A molecule is defined by the coordinates of the atoms as well as by a description of the bonded and nonbonded interactions. Since the structure obtained from the PDB only contains coordinates, we first have to construct the topology, which describes the system in terms of atom types, charges, bonds, etc. This topology is specific to a certain force field and the force field to be used is one of the issues requiring careful consideration. Here we use the GROMOS96 53a6 united atom force field, which is parameterized to give good predictions of free energies of solvation of amino acid side chains and which generally gives good agreement with NMR experiments.
It is important that the topology matches with the structure, which means that the structure needs to be converted too, to adhere to the force field used. To convert the structure and construct the topology, the program pdb2gmx can be used. This program is designed to build topologies for molecules consisting of distinct building blocks, such as amino acids. It uses a library of building blocks for the conversion and will fail to reckognize molecules or residues not present in the library. Issue the following command to convert the structure; choose the GROMOS 53a6 force field when prompted:

pdb2gmx -f 1LW9.pdb -o 1LW9.gro -p 1LW9.top

Read through the output on the screen and check the choice made for the histidine protonation and the resulting total charge of the protein. Also browse through the input structure file (1LW9.pdb, pdb format) and output structure file (1LW9.gro, gromacs format). Note the differences between the two formats. Also note that the output structure file could as well have been chosen to be in pdb format. Now browse through the topology file and look at the structure.

Energy minimization of the structure (vacuum)

Now a structure is generated in the correct format for the force field chosen, as well as the corresponding topology. However, the conversion of the structure involves the deletion and or addition of hydrogen atoms and may cause strain to be introduced, e.g. due to atoms positioned too close. together. Therefore, we have to perform an energy minimization on the structure, and let it relax a bit. This is in fact a simulation step, and involves two processes. First, the structure and the topology are combined into a single description of the system, together with a number of control parameters. This yields a run input file, which can be used as the single input for the simulation program mdrun. The advantage of splitting the two sub-processes is that the run-input file can be transferred to a dedicated (high-performance) computer network or a supercomputer and be processed there. Obviously, this is only relevant for the production simulation.

To perform these steps first save the file minim.mdp to your directory. This file contains the control parameters for the energy minimization. Have a look at the contents of this file. Notice the line starting with "integrator", which specifies the algorithm to be used. In this case it is set to perform a steepest descent energy minimization of 500 steps. Now, combine the structure, topology and control parameters using grompp:

grompp -v -f minim.mdp -c 1LW9.gro -p 1LW9.top -o 1LW9-EM-vacuum.tpr

grompp will note that the system has a non-zero charge and will print some other information regarding the system and the control parameters. The program will also generate an additional output file, which contains the settings for all control parameters (mdout.mdp). Now, continue with mdrun:

mdrun -v -deffnm 1LW9-EM-vacuum

Since there are only about 2300 atoms, the energy minimization finishes quickly. The -v flag causes the potential energy and the maximum force to be printed at each step, allowing following the minimization. It may be that the program stopped before the 500 steps were finished, in which case the bottom of an energy well was reached. Now that the structure is relaxed, it is time to set up the periodic boundary conditions and add the solvent.

Periodic boundary conditions

Before adding the solvent, a general layout (the space) of the simulation setup has to be chosen. Most commonly simulations are performed under periodic boundary conditions (PBC), meaning that a single unit cell is defined, which can be stacked in a space filling way. In this way, an infinite, periodic system can be simulated, avoiding edge effects due to the walls of the simulation volume. There are only a few general shapes available to set up PBC. We will use a rhombic dodecahedron because it corresponds to the optimal packing of a sphere, and is therefore the best choice for freely rotating molecules. To disallow direct interactions between periodic images we set a minimal distance of 1.0 between the protein and the wall of the cell, such that no two neighbours will be closer than 2.0 nm. Periodic boundary conditions are set with editconf:

editconf -f 1LW9-EM-vacuum.gro -o 1LW9-PBC.gro -bt dodecahedron -d 1.0 

Have a look at the output of editconf, and note the changes in the volume. Also, have a look at the last line of the file 1LW9-PBC.gro (tail -1 1LW9-PBC.gro). In the gromacs format (.gro), the last line specifies the unit cell shape. It always uses the triclinic matrix representation, with the first three numbers specifying the diagonal elements (xx, yy, zz) and the last six giving the off-diagonal elements (xy, xz, yx, yz, zx, zy).

Solvent addition

Now that the unit cell is set up, solvent can be added. There are several solvent models, each of which is more or less intimately linked to a force field. The GROMOS96 force fields are generally used with the simple point charge (SPC) water model. To fill the unit cell with SPC issue the following command:

genbox -cp 1LW9-PBC.gro -cs spc216.gro -p 1LW9.top -o 1LW9-water.gro

The topology is not required for solvent generation, but is updated to include the solvent added. Have a look at the end of the file 1LW9.top to see the addition, and check the numbers with the amount of solvent added according to genbox.

Addition of ions: counter charge and concentration

At this point, we have a solvated protein, but the +8 charge of the system still remains. To make the system neutral we have to add a number of counterions. In addition, it may be considered good practice to add ions up to a certain concentration. The program genion can take care of both tasks, but requires a run input file as input, i.e. a file containing both the structure and the topology. As we saw before, such a file can be generated with grompp:

grompp -v -f minim.mdp -c 1LW9-water.gro -p 1LW9.top -o 1LW9-water.tpr 

Subsequently, the file 1LW9-water.tpr can be used as input for genion. We specify a concentration of 0.1 M (-conc 0.1M) of NA+/CL- (-pname NA+ -nname CL-) to be added and indicate that an excess of one species of ion has to be added to neutralize the system (-neutral). genion will ask for a group of molecules to be used to partly replace with ions. The group 'SOL' should be chosen.

genion -s 1LW9-water.tpr -o 1LW9-solvated.gro -conc 0.1 -neutral -pname NA+ -nname CL- 

Note the numbers of ions added, and verify that an excess of 8 chloride ions is added for neutralization. Having replaced a number of water molecules with ions, the system topology in 1LW9.top is not correct anymore. Edit the topology file and decrease the number of solvent molecules. Also add a line specifying the number of NA+ ions and a line specifying the amount of CL-.

Energy minimization of the solvated system

Now the whole simulation system is defined. The only steps before starting a production run involve the controlled relaxation of the system. The generation of solvent as well as the placement of ions, may have caused unfavourable interactions, e.g. overlapping atoms, equal charges too close together. Therefore, the system is energy minimized again, following the same steps as before, but now with periodic boundary conditions. Edit the file minim.mdp and change the line pbc = no to pbc = xyz. Then run grompp and mdrun:

grompp -v -f minim.mdp -c 1LW9-solvated.gro -p 1LW9.top -o 1LW9-EM-solvated 
mdrun -v -deffnm 1LW9-EM-solvated 

Relaxation of solvent and hydrogen atom positions: Position restrained MD

With the greatest strain dissipated from the system, we now let the solvent adapt to the protein, i.e. we allow the solvent to move freely, while keeping the non-hydrogen atoms of the proteins more or less fixed to the reference positions. This is done to assure that the solvent configuration 'matches' the protein. This step is the first actual MD step. The control parameters for this step can be found in pr.mdp, which should be downloaded. Have a look at this file and note the integrator used as well as the define statement. The latter is used to allow flow control in the topology file. define = -DPOSRES will cause the keyword "POSRES" to be globally defined. Have a look at the end of the topology file to see how this is used to turn on position restraints. Use grompp and mdrun to run the simulation:

grompp -v -f pr.mdp -c 1LW9-EM-solvated.gro -p 1LW9.top -o 1LW9-PR.tpr >&
mdrun -deffnm 1LW9-PR 

Unrestrained MD simulation, turning on temperature coupling

The system should now be relatively free of strain. Time to turn up the heat, and start to couple the system to the heat bath. In other words, a short run will be performed in which the temperature coupling is turned on and the system is allowed to relax to the new conditions. Download the control parameter file nvt.mdp and have a look at the temperature coupling parameters in it. Then run the simulation, using grompp and mdrun:

grompp -v -f nvt.mdp -c 1LW9-PR.gro -p 1LW9.top -o 1LW9-NVT.tpr 
mdrun -deffnm 1LW9-NVT 

Unrestrained MD simulation, turning on pressure coupling

After relaxation of the temperature and the temperature coupling, it is time to start putting things under pressure. Perform a short simulation with pressure coupling turned on, using the control parameter file npt.mdp. Have a look at this file and identify the parameters controlling the pressure coupling.

grompp -v -f npt.mdp -c 1LW9-NVT.gro -p 1LW9.top -o 1LW9-NPT.tpr 
mdrun -deffnm 1LW9-NPT >& 02-mdrun.log

Production Simulation

Now finally, we have a more or less equilibrated solvated system containing our protein of interest and it is time to start a production simulation. Mind that production simulation does not imply that the whole simulation can be used for analysis of the properties of interest. Although some of the memory of the initial configuration has been erased, it is unlikely that the system has already reached equilibrium. In the analysis section we will investigate which part of the simulation can be considered in equilibrium and thus suited for further processing. But first it is time to set up the simulation. This merely involves running another simulation step, similar to the last step of the preparation. However, this is another of these times where it is necessary to think about the purpose of the simulation. The control parameters have to be chosen such that the simulation will allow to analyse the properties one has in mind. Questions to be addressed are:

Download or copy the control parameter file md.mdp and take a look at the contents of it. Take the final topology and structure file resulting from the preparation and combine them into a run input file using grompp.

Although the run input file is in a binary format, it is possible to have a look at the contents. This can be particularly useful when unexpected behaviour is encountered, which could arise from implicit control or force field parameters. The program to make the contents of the run input file readable is gmxdump. The output usually spans many pages and therefore it is advisable to redirect the output to file or to more or less, which allow to browse the content. Issue the following command and inspect the structure of the output (note that you have to replace the name topol.tpr with the name chosen as output for grompp):

gmxdump -s topol.tpr |& less

The simulation can now be run using mdrun in the same way as above. However, running the simulation with the parameters chosen, having a simulation length of 5 ns, would cost too much time. The simulation has be run before and the full set of results can be downloaded. For those following the tutorial at home, be patient, the trajectory file is rather large. After downloading, the file has to be unzipped, after which the analysis part of the tutorial can be started.