The CANDO plan
[everyone please copy in from relevant emails]
Within 2 years, we need to be able to produce sets of predicted hits/leads for any arbitrary disease ranked by probability (or some other measure) that it will work. That is, someone should be able to specify a set of structures (or a disease that selects a set of structures) and we need to be able to say what compound (or set of compounds) will likely bind, inhibit, or work in a druglike fashion against that entire collection the best, the second best, the third best, and so on. This compound/structure list is the Matrix and the first version of it (based on the Dunbrack 90 + DrugBank + some random compounds) should be done by this time. I think it's doable.
For ones which we have downstream collaborators, especially those that will testing the clinic, we will then quickly wet verify our predictions by using Kd studies using Biacore SPR to ensure it binds at least. We can also do a few other preliminary studies to ensure there is indeed inhibition, perhaps even of function like in the herpes protease case. We then pass the molecule(s) on to a well designed clinical study that can provide us with a clear answer on the efficacy of a drug in a small way.
So within 5 years, we would provide proof of concept idea that the CANDO platform works and is a viable means of doing in virtuale drug discovery.
Worse than random?
As with CASP1, we can do worse than random when you actually trying to DO stuff. While we'll be doing wet sanity checks, we could have pathological cases that don't work but score well overwhelming our docking method. We need to watch out for these and weed it out. This will happen if we're not careful with our training/testing methods so we don't introduce knowledge about the test set into our algorithm. The definition of "test set" is very broad, this problem applies to "de novo" algorithms that don't have a training set also.
docking platfrom which is currently "fragment based docking with dynamics" which really is an integrated dynamics/fragmentation strategy thinking about it. I used to separate it as: (1) incorporation of dynamics; (2) fragmentation of compounds at their rotatable bonds and then docking them individually, doing the dynamics, and then rejoining the most viable conformations *. While that is good to see what's happening, (1) and (2) are heavily interdependet on each other. Beautiful, excellent piece of work Brady. We're now going to reevaluate and test it in the field so I hope all the beauty and elegance in the conceptual ideas have a pay off.
Further more, rather than having something that will dock a single molecule to a single structure (which he has to do), Gaurav will have the luxury (or pain, depending on how you view it) of dealing with ~5000 molecules and ~50,000 structures. That's the input Brian will provide. And the output Gaurav provides will be a Matrix with compounds as rows and structures as columns and the best 1, N, etc. ranked structures.
To do this, I believe Gaurav will reimplement the entire platform we have in Perl in C/C++ and then get a stable version running and debugged and tested (I still am waiting to see the results from the herpes work) with Brady's scoring function (rmr6/BTTR/IRAP).
BAB/DEE type search
BAB/DEE type search built into the docking
BAB is the branch and bound algorithms used for optimization of discrete and combinatorial optimization problems. It is a basically a set of the strategies used for enumeration. DEE is the Dead-End elimination algorithm is also an optimization method used basically to identify the not-possible parts of combination to reach a global minima.
, and it's indeed the same idea but it's not doing the same thing obviously (in fact,
it sort of has become a fractal. working at atomic level
atomic level pipeline optimization
the pipeline works at the fragment level keep the best options that can be joined An external BAB/DEE will end where we don't have any joining to do (or the joining will still have to be done)
but rather simply whether or not a path down a search tree is viable or not. there is overlap...
the fragment based method allows for this and other optimisations.
don't need fragmentation for this.
You can do this at the atomic level and eliminate unlikely candidates and use any forcefield or docking method for this optimisation when you're doing the shotgun style of approach that's all.
The gain of speed per compound comes from doing it shotgun it's a pretty obvious thing to do but it needs to be done
there should be a large number of published algorithms to do this (dig up old Communications of the ACM journals from the 70s, which is where I got my clique finding algorithm that works quite well).
1. Select a set of high resolution protein structures to screen against. The initial screen will use a curated set with 2.5A resolution and 90% similarity. http://dunbrack.fccc.edu/Guoli/pisces_download.php
2. Annotate binding sites for the docking screen. This can be done manually, with prediction software, or a combination of both.
3. Determine a set of small molecules to use in the screen. For the CANDO project, we are looking for a set of molecules that have passed toxicity studies and have shown drug like properties.
Small Molecule Libraries
Binding Site Identification
paper with a bunch more, slightly older free computational screening tools