Документ взят из кэша поисковой машины. Адрес оригинального документа : http://kodomo.fbb.msu.ru/hg/allpy/file/cfcbd13f6761/repeats/test.py
Дата изменения: Unknown
Дата индексирования: Mon Feb 4 03:59:19 2013
Кодировка:
allpy: cfcbd13f6761 repeats/test.py

allpy

view repeats/test.py @ 864:cfcbd13f6761

Added fileio.BioPythonFile as a method to parse unknown file formats [closes #106] Biopython can parse more formats than EMBOSS, but surprisingly, it cannot do msf. Also, there is no way to see in the current tests, where a test used biopython or emboss for a particular IO task. This will likely be fixed with the 1.5.0 release with the new fileio system. For now, Biopython has precedence over EMBOSS, so an IO test of msf tests EMBOSS, and IO test of Stockholm tests Biopython.
author Daniil Alexeyevsky <dendik@kodomo.fbb.msu.ru>
date Mon, 25 Jul 2011 14:40:41 +0400
parents 6070ac379ec8
children
line source
1 import sys
2 import pprint
4 from repeat_joiner import Interval, RepeatJoiner
6 rj = RepeatJoiner()
7 for line in open(sys.argv[1]):
8 line = line.strip()
9 if line:
10 c1, c2, from1, to1, from2, to2, ori1, ori2 = line.split()[:8]
11 if c1 == 'DNA_1':
12 continue # first line
13 ori1 = True if int(ori1) == 1 else False
14 ori2 = True if int(ori2) == 1 else False
15 from1 = int(from1)
16 to1 = int(to1) + 1
17 from2 = int(from2)
18 to2 = int(to2) + 1
20 r1 = Interval(rj, c1, from1, to1, ori1)
21 r2 = Interval(rj, c2, from2, to2, ori2)
22 Interval.pair(r1, r2)
24 rj.build_groups()
25 rj.interval_groups.sort(key=lambda g: len(g), reverse=True)
28 print "group\tchr\tchr_from\tchr_to\tgroup_from\tgroup_to\tori\tgroup_ori"
29 for i, interval_group in enumerate(rj.interval_groups):
30 interval_group.sort(key=lambda i: i.group_start)
31 prev = set()
32 for interval in interval_group:
33 if interval.tuple() in prev:
34 continue
35 prev.add(interval.tuple())
36 print "%i\t%s" % (i, str(interval).replace(' ', '\t'))