|
Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://mouse.belozersky.msu.ru/~evgeniy/cgi-bin/wLake/alg.html
Дата изменения: Fri May 6 13:34:55 2011 Дата индексирования: Tue Oct 2 05:34:05 2012 Кодировка: |
wLake 1.7 algrithm
3D structures of proteins or their complexes are considered as related
if their significant parts can be reliably superimposed. Two water
molecules from different structures are called equivalent if the
distance between the centers of their oxygen atoms is less than a
threshold.
A set of pairwise equivalent water molecules from different
structures is called a cluster of water molecules.
An input of the algorithm is a set of superimposed related
structures. The goal is to find all ConWMs in the input.
Lemma. A graph G of |V| vertices and |E| edges cannot contain any
cliques of a size Min or greater if |V| is less than Min or |E| is
less than Min * (Min - 1) / 2. G is a complete graph if |E| = |V| *
(|V| - 1) / 2. Lemma is used in algorithm to reject graphs that
cannot contain cliques and to check the completeness of a graph.
At the first stage, a special graph is constructed. Each water
molecule corresponds to a vertex in the graph. Two vertices of the
graph are joined by an edge if water molecules are equivalent.
Thus, a ConWM corresponds to a clique in the graph. To find
all cliques of a size exceeding a fixed minimum (Min) the following
algorithm was developed.
ALGORITHM wLake (Graph G)
If G can't contain a clique of >= Min vertices then
return NULL
If G is complete then
return G
Vmin := a vertex from G with the minimal valence
Vs := {Vmin}
{vertices connected with Vmin}
Es := {edges connecting any pair of vertices
Vs}
G' := new Graph ()
For each V
Vs do
Add copy of V into G'
For each E
Es do
Add copy of E into G'
Delete Vmin from G
return wLake (G)
wLake (G')
This generally exponential algorithm works rather fast in cases of
graphs of water molecules (a few seconds of CPU for about thirty
superimposed structures).
The result of the program is a list of ConWMs. To estimate the
reliability of each detected cluster, we developed a special
program WLStat3. The input of the program is a list of clusters of size 2
and more detected by wLake 1.7 and a numbers of total and unclusters water
molecules in the every experimental structure. WLStat3 randomly put all
the water molecules into the different hydration site (the hydration site
is a cluster of water molecules or a simple unclustered molecule) 2000
times and count the number of clusters of a size 2, 3, etc observed. Than
E-value for each detected SWM cluster is determined.
wLake and WLStat programs can be downloaded for off-line usage
here.