allpy
view allpy/base.py @ 1103:029a2d4a5abd
markup_to_file did not actually work until now, rather it removed all gaps from alignment
author | Daniil Alexeyevsky <dendik@kodomo.fbb.msu.ru> |
---|---|
date | Sat, 09 Jun 2012 21:23:11 +0400 |
parents | afed1fd8920c |
children | 2b3cad50c2b1 |
line source
8 # import this very module as means of having all related classes in one place
12 """Set of characters to recoginze as gaps when parsing alignment."""
15 """Monomer object."""
18 """Either of 'dna', 'rna', 'protein'."""
21 """Mapping of related types. SHOULD be redefined in subclasses."""
24 """A mapping from 1-letter code to Monomer subclass."""
27 """A mapping from 3-letter code to Monomer subclass."""
30 """A mapping from full monomer name to Monomer subclass."""
33 """A sequence the monomer belongs to."""
35 @classmethod
37 """Create new subclass of Monomer for given monomer type."""
39 pass
50 # Save the class in data.monomers so that it can be pickled
51 # Some names are not unique, we append underscores to them
52 # in order to fix it.
61 # We duplicate distinguished long names into Monomer itself, so that we
62 # can use Monomer.from_code3 to create the relevant type of monomer.
67 @classmethod
69 """Create all relevant subclasses of Monomer."""
77 """Returns one-letter code"""
81 """Monomers within same monomer type are compared by code1."""
91 """Overcome difficulties with pickle.
93 Pickle is unable to store `set`s/`dict`s that have objects referencing
94 back the `set`/`dict` itself, which `sequence` in monomer does.
95 ( http://bugs.python.org/issue9269 )
97 To sidestep the bug we store the monomer WITHOUT `sequence` attribute.
99 See also `Sequence.__setstate__`.
100 """
108 """OBSOLETE"""
115 """Common functions for alignment and sequence for dealing with markups.
116 """
119 """Hook to be called from __init__ of actual class."""
123 """Create a markup object, add to self. Return the created markup.
125 - `name` is name for markup in `self.markups` dictionary
126 - optional `markup_class` is class for created markup
127 - if optional `use_existing` is true, it is no error, if same named
128 markup already exists (in this case, nothing is changed)
129 - optional keyword arguments are passed on to the markup constructor
131 For user markups you have to specify `name` and `markup_class`,
132 for the standard automatical markups just `name` is enough.
133 """
134 # We have to import markups here, and not in the module header
135 # so as not to create bad import loops.
136 # `base` module is used extensively in `markups` for inherinance,
137 # so breaking the loop here seems a lot easier.
151 """Remove markup."""
156 """Sequence of Monomers.
158 This behaves like list of monomer objects. In addition to standard list
159 behaviour, Sequence has the following attributes:
161 * name -- str with the name of the sequence
162 * description -- str with description of the sequence
163 * source -- str denoting source of the sequence
165 Any of them may be empty (i.e. hold empty string)
166 """
169 """Mapping of related types. SHOULD be redefined in subclasses."""
172 """Description of object kind."""
175 """Squence identifier."""
178 """Detailed sequence description."""
181 """Sequence source."""
192 """Append a new monomer to the sequence. Return the new monomer."""
194 "Please specify exactly one of: code1, code3, name"
207 @classmethod
209 """Create sequences from string of one-letter codes."""
222 """Returns sequence of one-letter codes."""
226 """Hash sequence by identity."""
230 """Overcome difficulties with pickle: add `monomer.sequence` after loading.
232 Pickle is unable to store `set`s/`dict`s that have objects referencing
233 back the `set`/`dict` itself, which `sequence` in monomer does.
234 ( http://bugs.python.org/issue9269 )
236 To sidestep the bug we store the monomer WITHOUT `sequence` attribute.
238 See also `Monomer.__getstate__`.
239 """
244 @classmethod
246 """OBSOLETE."""
250 """Alignment. It is a list of Columns."""
253 """Mapping of related types. SHOULD be redefined in subclasses."""
256 """Ordered list of sequences in alignment. Read, but DO NOT FIDDLE!"""
259 """Description of object kind."""
262 """Initialize empty alignment."""
267 # Alignment grow & IO methods
268 # ==============================
271 """Add sequence to alignment. Return self.
273 If sequence is too short, pad it with gaps on the right.
274 """
283 """Add row from a string of one-letter codes and gaps. Return self."""
291 ]
298 """Add row from row_as_list representation and sequence. Return self."""
307 """Insert list of `columns` after position `n`."""
311 """Pad alignment with empty columns on the right to width n."""
316 """Append sequences from file to alignment. Return self.
318 If sequences in file have gaps (detected as characters belonging to
319 `gaps` set), treat them accordingly.
320 """
325 """Write alignment in FASTA file as sequences with gaps."""
329 # Data access methods for alignment
330 # =================================
333 """Return list of rows (temporary objects) in alignment.
335 Each row is a dictionary of { column : monomer }.
337 For gap positions there is no key for the column in row.
339 Each row has attribute `sequence` pointing to the sequence the row is
340 describing.
342 Modifications of row have no effect on the alignment.
343 """
344 # For now, the function returns a list rather than iterator.
345 # It is yet to see, whether memory performance here becomes critical,
346 # or is random access useful.
358 """Return list of rows (temporary objects) in alignment.
360 Each row here is a list of either monomer or None (for gaps).
362 Each row has attribute `sequence` pointing to the sequence of row.
364 Modifications of row have no effect on the alignment.
365 """
376 """Return list of string representation of rows in alignment.
378 Each row has attribute `sequence` pointing to the sequence of row.
380 `gap` is the symbol to use for gap.
381 """
396 """Return representaion of row as list with `Monomers` and `None`s."""
400 """Return string representaion of row in alignment.
402 String will have gaps represented by `gap` symbol (defaults to '-').
403 """
412 """Return list of columns (temorary objects) in alignment.
414 Each column here is a list of either monomer or None (for gaps).
416 Items of column are sorted in the same way as alignment.sequences.
418 Modifications of column have no effect on the alignment.
419 """
429 # Alignment / Block editing methods
430 # =================================
433 """Remove all gaps from alignment and flush results to one side.
435 `whence` must be one of 'left', 'right' or 'center'
436 """
438 "aln.flush('left') is deprecated in favor of aln.realign(Left())"
439 )
451 """Remove all empty columns."""
457 """Turn all row positions into gaps (but keep sequences intact)."""
463 """Replace column contents with those of `new` alignment.
465 In other words: copy gap patterns from `new` to `self`.
467 `self.sequences` and `new.sequences` should have the same contents.
468 """
478 ]
484 """Realign self.
486 * apply function to self to produce a new alignment,
487 * update self to have the same gap patterns as the new alignment.
488 """
493 """Column of alignment.
495 Column is a dict of { sequence : monomer }.
497 For sequences that have gaps in current row, given key is not present in
498 the column.
499 """
502 """Mapping of related types. SHOULD be redefined in subclasses."""
505 """Return hash by identity."""
510 """Block of alignment.
512 Block is an intersection of several rows & columns. (The collections of
513 rows and columns are represented as ordered lists, to retain display order
514 of Alignment or add ability to tweak it). Most of blocks look like
515 rectangular part of alignment if you shuffle alignment rows the right way.
516 """
519 """Alignment the block belongs to."""
522 """List of sequences in block."""
525 """List of columns in block."""
527 @classmethod
529 """Build new block from alignment.
531 If sequences are not given, the block uses all sequences in alignment.
533 If columns are not given, the block uses all columns in alignment.
535 In both cases we use exactly the list used in alignment, thus, if new
536 sequences or columns are added to alignment, the block tracks this too.
537 """
549 """Insert list of `columns` after position `n`."""
560 """Base class for sequence and alignment markups.
562 We shall call either sequence or alignment a container. And we shall call
563 either monomers or columns elements respectively.
565 Markup behaves like a dictionary of [element] -> value.
567 Every container has a dictionary of [name] -> markup. It is Markup's
568 responsibility to add itself to this dictionary and to avoid collisions
569 while doing it.
570 """
573 """Name of markup elements."""
576 """If set to false, fileio should not save this markup."""
579 """Markup takes mandatory container and name and optional kwargs.
581 Markups should never be created by the user. They are created by
582 Sequence or Alignment.
583 """
589 """Recalculate markup values (if they are generated automatically)."""
590 pass
593 """Remove the traces of markup object. Do not call this yourself!"""
594 pass
596 @classmethod
598 """Restore markup from `record`. (Used for loading from file).
600 `record` is a dict of all metadata and data related to one markup. All
601 keys and values in `record` are strings, markup must parse them itself.
603 Markup values should be stored in `record['markup']`, which is a list
604 of items separated with either `record['separator']` or a comma.
605 """
609 """Save markup to `record`, for saving to file.
611 For description of `record` see docstring for `from_record` method.
613 If `keys` argument is given, restrict output to the given keys.
614 """
618 """Return list of elements in the container in proper order."""
622 """Return list of markup values in container.
624 Possible arguments:
626 - `map` -- a function, applied to each existing value
627 - `default` -- a value to return for non-existing values
629 If `default` is not specified, the function fails on markups that do
630 not have all of the values set.
631 """
642 """Markup for sequence.
644 Behaves like a dictionary of [monomer] -> value. Value may be anything
645 or something specific, depending on subclass.
647 Actual values are stored in monomers themselves as attributes.
648 """
657 """Remove the traces of markup object. Do not call this yourself!"""
662 """Return list of monomers."""
666 """Part of Mapping collection interface."""
672 """Part of Mapping collection interface."""
676 """Part of Mapping collection interface."""
680 """Part of Mapping collection interface."""
684 """Part of Mapping collection interface."""
688 """Markupf for alignment.
690 Is a dictionary of [column] -> value. Value may be anything or something
691 specific, depending on subclass.
692 """
701 """Return a list of columns."""
704 # vim: set ts=4 sts=4 sw=4 et: