|
Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://monkey.belozersky.msu.ru/~psn/about.htm
Дата изменения: Fri Feb 13 21:38:15 2004 Дата индексирования: Mon Oct 1 19:27:26 2012 Кодировка: |
|
SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins |
Home |
Algorithm in details |
Go! |
Help |
What is SDPpred?Input formatOutput formatExample |
What is SDPpred?SDPpred is a tool for prediction of residues in protein sequences that determine functional differences between proteins, having same general biochemical function. Many protein families contain homologous proteins that have a common biological function, but different specificity towards substrates, ligands, effectors, DNA, proteins and other interacting molecules including other monomers of the same protein. All these interactions must be highly specific. Our aim is to find amino acid residues, which account for different specificity of proteins from one family, i.e. to distinguish amino acid substitutions caused by random evolutionary process from those caused by switch of specificity. Amino acid residues that determine differences in protein functional specificity and account for correct recognition of interaction partners, are usually thought to correspond to those positions of a protein multiple alignment, where the distribution of amino acids is closely associated with grouping of proteins by specificity. SDPpred searches for positions that are well conserved within specificity groups but differ between them. These positions are called SDPs (specificity-determining positions). Such positions, though obvious in alignments containing a small number of proteins and specificity groups, become a challenge to find in large protein families with a variety of specificities. The only information required for prediction of SDPs is a multiple alignment of protein sequences divided into specificity groups (more details on the input format here). SDPpred can analyze alignments of length up to 2000 positions, containing at most 1000 proteins. There can be up to 1000 specificity groups. However, it is recommended that each group would contain at least three sufficiently divergent sequences. On the other hand, the average identity in each group should not be less than 25%. Having more than two groups also strongly improves the quality of prediction due to more efficient elimination of the background evolutionary similarity. SDPpred predicts a set of SDPs, maps them onto the multiple alignment of the protein family or onto a user-selected protein in this alignment (more detail on the output format here). SDPpred
back to top back to home page Input formatSDPpred supports the following input format. The only information needed for prediction of SDPs is a multiple alignment of protein sequences divided into specificity groups. The aligned sequences should be in the FASTA, GDE, or Pfam plain text (in the latter case with gaps as dashes and all characters in upper case) alignment format. The alignment should be manually edited in order to define the specificity groups. They should be separated by lines beginning with the "equals" sign and containing name of the following group, e.g. =Group1 Generally, the group name can be framed by any number of spaces and the "equals" signs, e.g. '=== Group1 ===' is also a valid header for the group named 'Group1'. Thus the input alignment should look either like this:
or this:
or this:
The user should also select the number of shuffles for computation of the statistical significance (between 1 000 and 10 000). An alignment of a thousand of sequences divided into several hundreds of specificity groups is analyzed in a couple of hours if each column is shuffled 10 000 times. Using less shuffles reduces the required time proportionally, but makes the results less reliable. Typically, the top of the SDP list remains the same, but minor variations may appear near the cutoff. The last parameter is the maximum allowed percentage of gaps in a column to be analyzed. Columns with a greater fraction of gaps are excluded from the analysis. Typically, this number should not exceed 30%, but if you are interested in finding, for instance, group-specific loops, it might be reasonable to set this parameter to a higher value. However, a large percent of allowed gaps produces many SDPs at the termini of the alignment, where it is likely to be incorrect. The following conditions on the input alignment must be satisfied:
back to home page Output formatSDPpred outputs the set of SDPs, i.e. positions of the alignment, which are likely to determine differences in functional specificity between the provided groups. These positions exhibit amino acid distribution highly correlated with grouping by specificity. The set of SDPs can be visualized in several ways:
back to top back to home page An exampleHere we provide an example of how SDPpred works. Consider the MIP family of membrane channels, which includes 17 proteins, all from bacteria. These proteins are divided into two groups, the GLP group of proteins transporting mainly glycerol, and the AQP group of proteins transporting water. The input alignment looks like this: The obtained set of SDPs consists of 10 positions. Here is the result page:
The amino acids listed below the alignment correspond to the first protein of the first group, namely __. By choosing another protein in the pull-down menu one get these numbers recalculated for the protein of one's choice. If one chooses the "List of SDPs" option, one gets the list of SDPs, which looks like this:
The plot of probabilities for setting the cutoff for this alignment looks as follows:
The set of SDPs presented on previous screenshots has been formed by setting the cutoff at 10, which corresponds to the global minimum. However, the second minimum can be of interest as well. By clicking on it one sets a new cutoff. Then the result pages would change, for example the page displaying the list of SDPs would look as follows:
back to home page |
|||
|
|