Analysis of the active site of nucleotidyl transferases using PyMol


Purpose of this exercise. In this exercise you will learn how to use a highly professional molecular graphics program called PyMol. You will apply PyMol to analyze the catalytic sites with divalent metal cations of nucleic-acid-processing enzymes called nucleotidyl transferases.


Step 0

Reading required in preparation for this practical:

·        W. Yang, J.Y. Lee, M. Nowotny (2006). Making and breaking nucleic acids: two-Mg2+-ion catalysis and substrate specificity. Mol. Cell 25, 5-13.

·        M. Jaskolski, J. Alexandratos, G. Bujacz, A. Wlodawer (2009). Piecing together the structure of retroviral integrase, an important target in AIDS therapy. FEBS J. 276, 2926-2946; sections on "Functional properties of retroviral INs", "The catalytic domain of IN", "Structural basis of the enzymatic activity of IN" (available here)

Recommended reading:

·        M. Nowotny, S.A. Gaidamkov, R.J. Crouch, W. Yang (2005). Crystal structures of RNase H bound to an RNA/DNA Hybrid: substrate specificity and metal-dependent catalysis. Cell 121, 1005-1016.


Step 1

Get to know PyMol

·        PyMol is a public domain, highly advanced molecular graphics program developed by DeLano Scientific. PyMol has its own Wiki resource, and you should first visit this site at

·        PyMol manuals are available on-line at

·        An excellent site showing advanced uses of PyMol is maintained by Daan Van Aalten at

·        When you launch PyMol, you get an External GUI (Graphical User Interface) (gray window) and a graphics viewer with an Internal GUI on the right. You can control PyMol by selecting commands from the External GUI Menu Bar, by typing commands in the command line, or by executing (Run) a script with PyMol commands (for an example click here). The latter mode is very useful; it allows you to know exactly what has been done to get a given effect. Alternatively, a lot of options are available via the Internal GUI, which can be used to get your picture quickly. In the upper part of the Internal GUI, you always see different "selections" defined during your session. On those selections, you can perform Actions (A), Show (S) them in different style, Hide (H) those displayed styles, work with Labels (L), or with Colors (C).To work with a given structure, you must always Open it first, e.g. via the File menu.


Step 2

Get to know our molecule

·        We will look at the active site of an enzyme called RNase H. It is a hydrolase that cuts RNA chains. RNase H is found in different organisms. It is also a domain of the reverse transcriptase (RT) of HIV and other retroviruses. When the RT reverse-transcribes the viral RNA into viral DNA, RNase H is then used to degrade the now-useless RNA template, so that the complementary DNA strand can be synthesized. In the second half of the exercise, you will learn about integrase (IN), which is another retroviral enzyme.


Step 3

Have a look at the PDB

·        In a browser window, open the PDB site.

·        Search for HIV RNase H.

·        Note the size of the protein, who discovered the structure, and when.

·        You can already use PyMol to have a look at HIV RNase H, by a right-button click on the icon near the PDB code, which will allow you to select the (PyMol) program for opening this PDB entry.


Step 4

RNase H in complex with RNA/DNA and Mg2+

·        Ultimately, we want to analyze the structure of RNase H from Bacillus halodurans. It has been determined by Nowotny et al. (Cell 121, 2005:1005-1016) in complex with an RNA/DNA hybrid substrate and magnesium cations (Mg2+), which are necessary for catalysis.

·        Find this structure in the PDB (1ZBL) and display it using PyMol. Before you proceed any further with this exercise, you should familiarize yourself with a review article describing the mechanism of nucleotidyl transferases by Yang et al. (Mol. Cell 25, 2006:5-13).


Step 5

Get an overview of the 1ZBL structure in PyMol

·        In particular, use the preset simple Action to get a line drawing of the molecule. You will note that the structure is dimeric, i.e. consists of two copies of the RNase H subunit. Play with the excellent select/display options of PyMol. For instance, using the right mouse button, you can select a chain to be shown in a preset stick representation, by just one click!

·        For further work, you might want to simplify the scene by, for instance, selecting only the protein chain A and hiding the protein chain B.

·        Find the magnesium ions in the active site. (If you cannot see them, use this trick: type "select MG, element mg" – a new selection, MG, pops up; Show it as spheres, you won't be able to miss the Mg2+ cations now!)

·        Select only those residues and those water molecules which are in the coordination spheres of the Mg2+ cations. Hide everything else.

·        Find a nice view. You can write out the orientation matrix by the following commands:

log_open logfile.log                          opens a logfile

get_view                                             writes out the orientation matrix

log_close                                          closes the logfile

·        Later, you can use this orientation matrix to get exactly the same view using an analogous set_view command (usually issued from within a script, as it contains a rather big 3x6 matrix).

·        You can give the selected residues a nice ball-and-stick representation, for instance, using the Appearance Wizard, which will turn atoms into spheres by simple clicks. You can control the spheres (like anything else in PyMol) by the Setting menu Edit All... option. Editing a given parameter, e.g. sphere_scale or sphere_transparency, will take immediate effect. Try it!

·        In the Mouse menu pick Selection mode as Atoms; this way you will be selecting only those atoms, on which you click.

·        Click on the two Mg atoms (green crosses) in succession. A new Selection "(sele)" appears. Change its name under A to Metal. Working in the same way, select successively all residues (only side-chain atoms) and water molecules forming the coordination spheres of the two magnesium cations.  For instance, you might click on the P atom from the RNA backbone which is within the coordination area, and on its four O atoms, and rename this selection as PO4.

·        You can render each such selection as ball-and-stick by using again A with preset ball and stick.

·        When our object of interest (the coordination complex) has been nicely selected and rendered, you might want to hide everything else, by clicking everything under H (hide) at 1ZBL.

·        To draw the coordination bonds you will use the Measurement option of the Wizard menu.

·        Now simply click on each pair of atoms that should be connected by a coordinative bond. (Of course, one point of each such bond will be one of the Mg atoms!) 

·        Each selected bond will be displayed as a dashed line annotated with its length in A.

·        Analyze the coordination spheres of the two Mg2+ cations. What are the coordination numbers? What are the coordination polyhedra? Which Mg2+ cation has a regular, and which a distorted coordination sphere?

·        The bond distances are very important, but for a nice drawing, you may want to delete the annotations. Simply, for each measurement click on H and hide the label. The coordinative bonds will look nicer as solid lines. Go again to the Settings window, find "dash" and edit the following parameters: dash_gap 0, dash_length 1, dash_radius 0.1.

·        To get a beautiful appearance of the drawing, run the ray tracing procedure by clicking Ray in the External GUI. (Ray tracing is a special artistic procedure, which handles a three-dimensional object in the computer by shining light on it and tracking, or tracing, the reflections of all possible rays from all possible points of the object; in this way one gets shiny spots, as well as shadows, much as with natural illumination.)

·        If you want to print your figure (Yyyyyyyeeees!!!), it will look better on white background, which you can select from the Display menu.

·        The figure can also be printed in stereo. For this purpose, you have to prepare two drawings (*.png files), one for the left eye (-left) and one for the right eye (-right). (This is for "wall" or parallel viewing; if you prefer to cross your eyes, as I do, simply swap the left/right drawings.)

ray angle=+3                                   renders the scene from the left eye's point of view

png left.png                              writes a file (left.png) with the current scene

ray angle=-3                                   renders the scene from the right eye's point of view

png right.png                            writes a file (right.png) with the current scene

·        You can then paste the left.png and right.png files into your word or powerpoint processor, or preferably into a paint shop program such as corel, for printing. For proper viewing (from a distance of 30-40 cm with normal pupil separation), the components of the stereo pair should be 60 mm apart (distance between equivalent points).


Step 6

Now comes the real thing!

·        Load the PDB file with the atomic coordinates of a cadmium (Cd2+) complex of the catalytic domain of retroviral integrase (1VSJ).

·        Intgrase, is the third (in addition to reverse transcriptase and protease) enzyme encoded by the (RNA) genome of retroviruses. Retroviral integrase consists of three rather loosely connected domains: N-terminal domain with a zinc-finger motif, central (core) catalytic domain, and DNA-binding C-terminal domain. The structure of the complete integrase is unknown, but we know very well the structure of the catalytic core domain from very precise crystallographic studies of the protein form the HIV and Rous sarcoma, or more correctly – Avian Sarcoma, Virus (ASV). The (catalytic domain of) integrase performs a sequence of important functions. It binds the cytoplasm-synthesized (by RT) viral DNA and transports it into the nucleus of the infected cell. In the nucleus, it catalyzes two highly orchestrated reactions, both of which require DNA cleavage. The first reaction, called processing, removes two nucleotides from each 3' end of the viral DNA, exposing 3'-OH groups at a characteristic viral DNA sequence. In the second reaction, called joining, the host DNA is cut on both strands (but at a characteristic 5-6 bp distance, called stagger), and the exposed 3' ends of the viral DNA are ligated to the host genomic DNA. The integration is completed by cellular DNA repair enzymes, which add the 5-6 nucleotides missing at each integration site as a result of the staggered cut and seal the remaining gaps (at the free 3' ends of the cut host DNA). Note that through the action of the integrase, retroviral infection becomes permanent in the affected cell, as the viral DNA, now called "a provirus", is coded within the cellular genome! To make the situation even more devastating, (i) integration can occur at numerous places within one nucleus, (ii) will additionally lead to damage of the genetic information at the site of integration, and (iii) can take place at apparently random, non-specific sites of the host genome. Because the integrase breaks and forms phosphodiester bonds of the DNA backbone, and in particular because in the joining reaction a DNA strand transfer takes place, it can be classified as nucleotidyl transferase.

·        Therefore, in the 1VSJ structure we have another example of nucleotidyl transferase. For its catalytic activity, integrase has an obligatory requirement for divalent metal cations, manganese (Mn2+) or magnesium (Mg2+), the latter one being the metal used in vivo. In experimental conditions, the catalytic domain of retroviral integrase can also bind other divalent metal cations, such as cadmium, which are useful for studying the structural aspects of the metal-assisted catalysis of this enzyme.


Step 7

Superposing the two active sites

·        Your task will be now to superpose the active site of integrase on that of RNase H. It will not be very trivial, but possible. First, we note some obvious chemical similarities between the two active sites. (i) Both are arranged around two divalent metal cations. (ii) The separation of the cations is the same (about 4 A). (iii) The active site is formed by a constellation of acidic residues; in the retroviral integrase they form the so-called D,D(35)E sequence signature, with two aspartic acids (D64 and D121 in ASV integrase) and a glutamate (E157 in ASV), separated from the second aspartate by exactly 35 residues. (iv) The coordinating ligands are only O atoms; (v) In both cases, the two metal centers are "bridged" by two common ligands, by an aspartate (D64 in ASV integrase) and by another O atom: from a water molecule in the 1VSJ structure (integrase) or from a phosphate group contributed by the RNA substrate in the 1ZBL structure (RNase H).

·        There is a simple tool in PyMol for superposing two structures according to best fit of the coordinates of selected atom pairs (pairs coming from the two structures being aligned, that is). To accomplish this, select Pair Fitting in the Wizard menu. A submenu appears in the lower part of the Internal GUI and a prompt is displayed in the upper left corner of the graphics viewer. Let's say that we want to superpose the integrase active site (from 1VSJ) onto the active site of RNase H (from 1ZBL). At each prompt for "mobile atom" select an atom from the 1VSJ structure and then click on the corresponding atom in the 1ZBL structure ("target atom"). For instance, we can "pair" each of the Cd2+ ions with their Mg2+ counterparts, and similarly the O atoms in the bridging carboxylate groups. Each selected pair of atoms gets connected by a yellow line. Before the fit is calculated the lines will be pretty long. When you have selected the four atom pairs as described, you can click on "Fit 4 Pairs" in the Internal GUI submenu. The superposition matrix will be calculated and the "mobile" structure will be superposed on the "target" structure. (If during the Pair Fitting you make a mistake, you can Delete Last Pair or Clear the counter and start again.)


Step 8


·        Have a good look at the results of your work. Do you think you have chosen the correct Cd-Mg pairs for best correspondence? Or perhaps the other alternative would be better? Do you think the match of the active sites is close enough? Is it possible that the catalytic mechanism of integrase is based on two divalent metal cations in a way similar to that described for other nucleotidyl transferases? If yes, what would be the roles of the two metal ions in the active site of integrase?

·        Write down your observations and conclusions.

·        Print a figure with the illustration of your work.

·        As a take-home-lesson, remember about the amazing amount and level of detail of the information  that can be gleaned from three-dimensional structure of macromolecules. Reflect about the methods by which these structures are determined experimentally. Reflect about the chemical principles underlying the functioning of macromolecules, for example coordination of metal ions by enzymes that break and form P-O bonds in nucleic acids.


Mariusz Jaskolski, 06.12.2009