This tutorial follows the complete process of parametrising a new molecule within the MARTINI forcefield, covering aspects of mapping design, model generation and model validation.
PyCGTOOL is used at multiple stages, showing its use in several different situations.
All files required to follow this tutorial are present in the
docs/tutorial_files directory of the PyCGTOOL repository.
The molecule chosen as a target for this parametrisation is the beta-receptor antagonist atenolol.
The reference simulation for the parametrisation of atenolol was performed using the GROMOS 54A7 united atom forcefield with a topology from the ATB database.
A single molecule of atenolol was solvated and equilibrated, before collecting a 50 ns trajectory using the GROMACS molecular dynamics simulator.
A reduced copy of this trajectory is provided in the tutorial files (as
ref.gro contains the initial structure) since the original is prohibitively large.
Designing a suitable mapping from the atomistic to the coarse-grained representation requires some experience and a degree of ‘chemical intuition’, but the ease with which the mapping may be modified using PyCGTOOL allows the mapping to be iterated much more quickly than if the construction of the CG model were to be performed manually.
The process of designing a mapping involves splitting the molecule into fragments, each of which contains approximately four heavy atoms.
Start by finding functional groups such as amides or carboxylates, each of which may become a single bead.
Next, replace phenyl rings with a triangle of three
small type beads, each of which contains two heavy atoms and has reduced Lennard-Jones size and mass, as compared to the normal four-atom beads.
Finally, divide any remaining parts of the molecule into beads of four heavy atoms as required.
The ideal bead will contain four heavy atoms and be nearly spherical, but this is not always possible.
If any atoms remain after clusters of four have been allocated, it may be required to use a mapping of three-to-one for some beads.
After the atoms have been allocated to beads, determine which beads should be bonded by inspection of the atomistic structure; bonds between functional groups in the atomistic structure become bonds between beads in the CG structure. It will probably be the case that there is no obviously best mapping and bond topology, which is not at this point a major issue as multiple mappings can be assessed easily.
Once the mapping and bond topology have been determined, they must be put into a format readable by PyCGTOOL. This format is as described in the introduction to PyCGTOOL. The mapping file must contain a molecule header which is the residue name enclosed in square brackets. Each subsequent line represents a single CG bead and is of the structure (where items in square brackets are optional):
<bead name> <MARTINI type> [<bead charge>] <component atom 1> [<component atom 2> ...]
The bonding file has a similar structure with a molecule header followed by lines defining each bond in the format:
<bead 1> <bead 2>
It is not necessary to define angles since PyCGTOOL is able to infer these from the bond network. If only a subset of the possible angles require parameters then they may be listed, preventing automatic enumeration of all possible angles.
The mapping used in the paper to describe atenolol (just one of the several possible mappings) is described by the following input files.
; atenolol.map [36KB] N1 P1 N1 C12 C13 C14 O1 P1 O3 C1 C2 C3 O2 SP3 O2 C4 C1 SC3 C5 C6 C2 SC3 C8 C9 C3 SC3 C7 C10 N2 P5 C11 O1 N2
; atenolol.bnd [36KB] N1 O1 O1 O2 O2 C1 O2 C2 C1 C2 C1 C3 C2 C3 C3 N2
The process of model generation after having created the mapping and bond definition files is automated by PyCGTOOL. In the simplest case, a parameter set may be generated simply by passing the four input files to PyCGTOOL:
pycgtool ref.gro ref.xtc -m atenolol.map -b atenolol.bnd
This will create two output files
out.gro, the mapped CG coordinates, and
out.itp, the calculated CG model parameters.
Running the CG Simulation¶
Note: These instructions assume the use of GROMACS 5.0 or newer.
From this stage the usual preparation of MARTINI/GROMACS simulations applies.
The output coordinates
out.gro must be solvated using the GROMACS tool
gmx solvate with the options:
gmx solvate -cp out.gro -cs ../../data/water.gro -o solv.gro -radius 0.21
When we run this, GROMACS will tell us how many solvent molecules have been added - remember this number!
Since MARTINI water cannot be automatically added to the
.top file, this must be done manually.
A template file,
template.top, is provided.
Copy this to
topol.top and add a new line to the bottom with a
W, a space, then the number of solvent molecules added by
The three stages of simulation: minimisation, equilibration, and production may then be run:
gmx grompp -f em.mdp -c solv.gro -p topol.top -o em.tpr gmx mdrun -deffnm em gmx grompp -f npt.mdp -c em.gro -p topol.top -o npt.tpr gmx mdrun -deffnm npt -v gmx grompp -f md.mdp -c em.gro -p topol.top -o md.tpr gmx mdrun -deffnm md -v
These simulations should take a few minutes on a modern desktop.
It is recommended to perform validation before using a generated CG model for production simulations, so that we may have confidence in its ability to replicate the behaviour of the molecule being studied. The methods of validation applied in the PyCGTOOL paper are a comparison of the distribution of bonded term measurements between the CG test simulation and the atomistic reference simulation, and a comparison of the radius of gyration between these two simulations. Additionally, other methods of validation should be applied relevant to the class of molecule being studied; for instance, validation of membrane lipids should compare the membrane thickness and surface area per lipid to the atomistic reference.
To compare the distribution of bonded terms, we must first rerun PyCGTOOL to generate samples of the bonded measurements. For the atomistic reference simulation, this can be done by running:
pycgtool ref.gro ref.xtc -m atenolol.map -b atenolol.bnd --dump-measurements
PyCGTOOL will output a sample of each measured bond length and angle, in the files
Since the reference trajectory is short this will give us the measurements from every frame, but for longer trajectories it gives us a sample of 10,000 values to make the analysis quicker.
Since we will be collecting samples of the same measurements from the CG simulation, these files should be renamed to, for instance,
Next, we collect the same samples for the CG simulation using:
pycgtool md.gro md.xtc -b atenolol.bnd
Since we provide a bond file, but not a mapping file, PyCGTOOL will know that this is intended to collect bond measurements and will set the
--dump-measurements option automatically.
Again, the files created will be called
These samples were compared in the paper using an R script to generate a series of boxplots, but a simpler Python script is provided which may be used to compare the mean and standard deviations of the samples:
./average_columns.py ref_length.dat 36KB_length.dat ./average_columns.py ref_angle.dat 36KB_angle.dat
If the automatically generated parameters provide an accurate representation of the reference structure, the percentage error between the two samples will be small.
Validation of the more general molecular conformation may be performed by comparison of the radius of gyration of the reference and CG models
This may be performed using the standard GROMACS too
gmx gyrate -f ref.xtc -s ref-for-rgyr.tpr -o ref-gyr.xvg gmx gyrate -f md.xtc -s md.tpr -o cg-gyr.xvg
In both cases select the 36KB group as the one on which to perform the calculation.
These commands will calculate the radius of gyration for each trajectory frame for both the reference and CG simulations.
.xvg files may be visualised using a graphing program such as
xmgrace or compared in the same way as the bonded samples, using:
./average_columns.py ref-gyr.xvg cg-gyr.xvg
As before, a small percentage difference in each of the columns suggests good replication of gross conformation. Note that the first column is the time in picoseconds, so don’t worry if that column doesn’t match - the other four columns are the total radius of gyration, and components along the X, Y and Z axes respectively.
In addition to these simple forms of validation, it is recommended that further validation, relevant to the class of molecule, is performed. In the case of membane lipids, for instance, this may take the form of an assessment of membrane thickness and surface area per lipid.