Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://monkey.belozersky.msu.ru/~psn/algo.htm
Дата изменения: Mon Feb 9 19:47:18 2004 Дата индексирования: Mon Oct 1 19:26:34 2012 Кодировка: |
SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins |
Home |
Algorithm in details |
Go! |
Help |
|
Consider
a multiple protein sequence alignment. The proteins are divided into N
specificity groups, numbered by i=1,...,N. The goal in to identify columns (positions) in the
alignment, in which the amino acid distribution is closely associated
with the grouping by specificity. This association in
column p of the alignment is measured by the mutual information To address the facts that frequencies
are calculated based on a small sample, and that substitutions to amino
acids with similar physical properties should be weakly penalized, the observed amino acid frequencies are modified.
Instead of using , where
is the number of occurrences of residue
in group i, is the
size of group i (here i is a single group or the whole alignment),
SDPpred uses smoothed frequencies To calculate the statistical
significance of the obtained values of Ip,
each column is shuffled, yielding the distribution .
To offset the background similarity of proteins that is higher within groups than
between groups, we calculate the expected
mutual information for the column p where a
and b do not depend on the position, i.e. are the same for every
position of the alignment , so that Then, Z-scores
are calculated: Given a series of Z-scores corresponding
to every position of the multiple alignment, one needs to evaluate the significance of
the Z-scores in order to tell whether the observed Z-score is sufficiently
high to indicate a SDP. SDPpred uses an automated procedure for setting the
thresholds based on the computation of the Bernoulli estimator.
The observed Z-scores are oredered by decrease: .
The threshold is defined as: |