allpy
view allpy/base.py @ 284:bee4d155f526
Added base.Sequence.from_fasta
author | Daniil Alexeyevsky <me.dendik@gmail.com> |
---|---|
date | Wed, 15 Dec 2010 23:44:20 +0300 |
parents | 59320c160dae |
children | cf6cdc3b7ec5 |
line source
14 """Class of monomer types.
16 Each MonomerType object represents a known monomer type, e.g. Valine,
17 and is referenced to by each instance of monomer in a given sequence.
19 - `name`: full name of monomer type
20 - `code1`: one-letter code
21 - `code3`: three-letter code
22 - `is_modified`: either of True or False
24 class atributes:
26 - `by_code1`: a mapping from one-letter code to MonomerType object
27 - `by_code3`: a mapping from three-letter code to MonomerType object
28 - `by_name`: a mapping from monomer name to MonomerType object
29 - `instance_type`: class of Monomer objects to use when creating new
30 objects; this must be redefined in descendent classes
32 All of the class attributes MUST be redefined when subclassing.
33 """
49 # We duplicate distinguished long names into MonomerType itself,
50 # so that we can use MonomerType.from_code3 to create the relevant
51 # type of monomer.
55 @classmethod
57 """Create all relevant instances of MonomerType.
59 `type_letter` is either of:
61 - 'p' for protein
62 - 'd' for DNA
63 - 'r' for RNA
65 `codes` is a table of monomer codes
66 """
71 @classmethod
73 """Return monomer type by one-letter code."""
76 @classmethod
78 """Return monomer type by three-letter code."""
81 @classmethod
83 """Return monomer type by name."""
87 """Create a new monomer of given type."""
96 """Monomer object.
98 attributes:
100 - `type`: type of monomer (a MonomerType object)
102 class attributes:
104 - `monomer_type`: either MonomerType or one of it's subclasses, it is used
105 when creating new monomers. It SHOULD be redefined when subclassing
106 Monomer.
107 """
113 @classmethod
117 @classmethod
121 @classmethod
131 """Sequence of Monomers.
133 This behaves like list of monomer objects. In addition to standard list
134 behaviour, Sequence has the following attributes:
136 * name -- str with the name of the sequence
137 * description -- str with description of the sequence
138 * source -- str denoting source of the sequence
140 Any of them may be empty (i.e. hold empty string)
142 Class attributes:
144 * monomer_type -- type of monomers in sequence, must be redefined when
145 subclassing
146 """
166 """Returns sequence in one-letter code."""
169 @classmethod
171 """Create sequences from string of one-letter codes."""
176 @classmethod
178 """Read sequence from FASTA file.
180 File must contain exactly one sequence.
181 """
189 """Alignment.
191 Behaves like a list of Columns.
192 """
193 # _sequences -- list of Sequence objects. Sequences don't contain gaps
194 # - see sequence.py module
197 """overloaded constructor
199 Alignment()
200 new empty Alignment
202 Alignment(sequences, body)
203 new Alignment with sequences and body initialized from arguments
205 Alignment(fasta_file)
206 new Alignment, read body and sequences from fasta file
207 """
218 """ Returns width, ie length of each sequence with gaps """
222 """ The number of sequences in alignment (it's thickness). """
226 """ Calculate the identity of alignment positions for colouring.
228 For every (row, column) in alignment the percentage of the exactly
229 same residue in the same column in the alignment is calculated.
230 The data structure is just like the Alignment.body, but istead of
231 monomers it contains float percentages.
232 """
233 # Oh, God, that's awful! Absolutely not understandable.
234 # First, calculate percentages of amino acids in every column
250 # Second, map these percentages onto the alignment
264 @classmethod
266 """ Import data from fasta file
268 >>> import alignment
269 >>> sequences,body=alignment.Alignment.from_fasta(open("test.fasta"))
270 """
289 #if there is description
316 @staticmethod
318 """ Constructs new alignment from sequences
320 Add None's to right end to make equal lengthes of alignment sequences
321 """
331 """ Saves alignment to given file
333 Splits long lines to substrings of length=long_line
334 To prevent this, set long_line=None
335 """
339 """ Simple align ths alignment using sequences (muscle)
341 uses old Monomers and Sequences objects
342 """
367 """ returns list of columns of alignment
369 sequence or sequences:
370 * if sequence is given, then column is (original_monomer, monomer)
371 * if sequences is given, then column is (original_monomer, {sequence: monomer})
372 * if both of them are given, it is an error
374 original (Sequence type):
375 * if given, this filters only columns represented by original sequence
376 """
392 """ Returns string representing secondary structure """
398 """ Block of alignment
400 Mandatory data:
402 * self.alignment -- alignment object, which the block belongs to
403 * self.sequences - set of sequence objects that contain monomers
404 and/or gaps, that constitute the block
405 * self.positions -- list of positions of the alignment.body that
406 are included in the block; position[i+1] is always to the right from position[i]
408 Don't change self.sequences -- it may be a link to other block.sequences
410 How to create a new block:
412 >>> import alignment
413 >>> import block
414 >>> proj = alignment.Alignment(open("test.fasta"))
415 >>> block1 = block.Block(proj)
416 """
419 """ Builds new block from alignment
421 if sequences==None, all sequences are used
422 if positions==None, all positions are used
423 """
433 """ Saves alignment to given file in fasta-format
435 No changes in the names, descriptions or order of the sequences
436 are made.
437 """
448 """ Returns length-sorted list of blocks, representing GCs
450 * max_delta -- threshold of distance spreading
451 * timeout -- Bron-Kerbosh timeout (then fast O(n ln n) algorithm)
452 * minsize -- min size of each core
453 * ac_new_atoms -- min part or new atoms in new alternative core
454 current GC is compared with each of already selected GCs if
455 difference is less then ac_new_atoms, current GC is skipped
456 difference = part of new atoms in current core
457 * ac_count -- max number of cores (including main core)
458 -1 means infinity
460 If more than one pdb chain for some sequence provided, consider all of them
461 cost is calculated as 1 / (delta + 1)
463 delta in [0, +inf) => cost in (0, 1]
464 """
492 break
496 break
500 """ Returns string consisting of gap chars and chars x at self.positions
502 Length of returning string = length of alignment
503 """
510 """ Save xstring and name in fasta format """
514 """ Iterates monomers of this sequence from this block """
519 """ Iterates Ca-atom of monomers of this sequence from this block """
523 """ Iterates pairs (sequence, chain) """
530 """ Superimpose all pdb_chains in this block """
539 # Apply rotation/translation to the moving atoms
543 """ Save all sequences
545 Returns {(sequence, chain): CHAIN}
546 CHAIN is chain letter in new file
547 """
553 # TODO: read from tmp_file.name
554 # change CHAIN
555 # add to out_file
559 # vim: set ts=4 sts=4 sw=4 et: