This is a quick summary of the two main parts of our current research.

Comparative Genomics: I compared 123 cell wall associated proteins from Bacillus subtilis with proteins in 60 complete proteomes of a diverse collection of bacteria, including Gram positive and Gram negative species. Similar proteins found in many species of bacteria may be suitable targets for broad spectrum antibiotic drugs. There are at least two reasons for this: (1) proteins found in more species of bacteria are probably more important, and more likely to be essential, for bacterial survival than uncommom proteins; (2) if the normal function of such proteins can be inhibited by a chemical compound (drug), then that compound might be effective against a broad spectrum of bacteria. Broad spectrum antibiotics may be appropriate where the cause of an infection is unknown. For example, during an emergency where there is not enough time to make a specific diagnosis.

The cell wall is an esential structure for virtually all bacteria as it protects the cell from damage and osmotic lysis; it is the target of our best antibiotics (Leaver et al., 2009).

Given the importance of the cell wall to most bacteria and the lack of a cell wall in human cells, I focussed my research efforts on cell wall proteins. However, the cell wall is quite complex and there are over 100 cell wall associated proteins in B. subtilis. Starting with this well known model organism, I compared 123 B. subtilis cell wall protein sequences with proteins in 60 species of bacteria (selected for diversity and pathogenicity).

Proteins with similar sequences tend to have similar 3D structures and similar functions in each organism (Rost, 1999). It is often said that sequence determines structure determines function, but small differences in sequence can make big differences in structure and, even more confusingly, very different sequences can fold into 3D structures that have the same function because they have the same active site (the place where other molecules bind to the protein)(Lesk, 2008). As a result, sequence analysis and predictions based on genomic data is often more about probability than certainty. Nonetheless, using BLAST (Altschul et al., 1997) for pairwise sequence alignment and the Protein Data Bank for 3D structure information, I generated the following and selected four proteins that are broadly conserved and have a known 3D structure. Four Proteins
Gene name UniProt # PDB ID
ftsZ P17865 2VXY
coaD O34797 1O6B
ywtF Q7WY78 3MEJ
racE P94556 1ZUW

The BLAST results table only includes hits where a 3D structure is currently available. Sequence comparison results for all B. subtilis proteins with a similar protein in at least 10 of the 60 proteomes may be found here.

Virtual screening: I downloaded the Natural Products Data set (NPD) (about 90,000 ligands) from the ZINC online database (Irwin and Shoichet, 2005). ZINC contains over 13 million purchasable compounds represented by text files that include 3D coordinates in a format well suited to "docking" ligands with proteins (in silico). This is an extremely cost effective way to virtually screen vast numbers of ligands and eliminate those with no drug potential from further consideration..

However, virtual screening is computationally intensive and screening even 100,000 would take too long on an ordinary PC. So I used the CISBAN computing cluster at Newcatsle. At the time, access to this cluster's 88 processors were managed by software known as CONDOR (Thain et al., 2005). I used Perl scripts, mk.pl, dock.pm, go.pl , to automate the submission of thousands of batch jobs to cisbclust via CONDOR. This allowed us to dock about 100,000 ligands with our four proteins using AutoDock Vina (Trott and Olson, 2010) within a small fraction of the time that would have been required by one PC.

Results: The docking scores for NPD ligands from the ZINC database bind best with FtsZ, with CoaD coming in second place. This chart shows the worst (max), the average (mean), and the best (min) binding score for each protein. These scores are in kcal/mol and, as odd as this may sound, the more negative the better. Based on the work completed so far, Demuris Ltd. has purchased the NPD chemical compounds with the greatest probability of success for in vitro testing. In theory at least, our research has provided a rational and principled basis for prioritizing relatively expensive wet lab experiments at virtually no cost per chemical compound (ligand) considered.