GAP PENALTIES
GAPOPEN: Penalty for the first residue in a
gap (-12 by default for fasta with proteins, -16 for DNA).
GAPEXT:Penalty for additional residues in a gap (-2 by default
for fasta with proteins, -4 for DNA).
When aligning two sequences together it is often required to insert gaps in them in order to optimise the alignment. This can be done on the basis of indentities alone, inserting gaps in the sequences as required where there are no matches. However, this is not recommended for biological sequence comparisions because similarities are then not taken into consideration. A scoring scheme, often referred to as a comparison matrix, is used which gives a high positive score when the identical residues or bases are properly aligned. Slightly less if a similarity or homology is posible (i.e. a conservative subsitution) and even negative scores for alignment pairs which are not biologicaly significant.
When two sequences are aligned together a diagonal is created which depicts the best alignment path for these. This diagonal may be broken in places due to mismatches. If there are too many of these the diagonal is subdivided into several smaller ones. In order to make the alignment better gap initiation and gap extension penalties are introduced which penalise the total alignment score.
In general,the lower the gapping penalties, the more gaps and more indentities are detected but this should be considered in relation to biological significance.
Fasta, blast, blitz and clustalw use slightly different terms to referr to gap initiation and gap extension penalties. In general, gapopen and opengap are the former while gapext and extendgap the later.
Some of the later improvement to these programs include the possibility to penalise gaps separately on the database sequences and the query sequence separately. Such is the case of blitz. In clustalw, a gap penalty exists which penalises separetly the length of a gap, closing a gap and the introducion of a pairwise gap in both sequences.