Документ взят из кэша поисковой машины. Адрес оригинального документа : http://mccmb.belozersky.msu.ru/2015/proceedings/abstracts/18.pdf
Дата изменения: Mon Jun 15 15:39:59 2015
Дата индексирования: Sat Apr 9 23:21:04 2016
Кодировка:
The genome wide analysis of the large tandem repeats in the closely related species Dmitrii I. Ostromyshenskii
Institute of Cytology RAS, Saint-Petersburg, Russia necroforus@gmail.com

Olga I. Podgornaya
Institute of Cytology RAS, Saint-Petersburg, Russia opodg@yahoo.com

Large tandemly repeated sequences (TR, or satellite DNA) are necessary part of higher eukaryotes genomes and can comprise up to tens percent of the genomes. Much of TRs' functional nature in any genome remains enigmatic because there are only few tools available for dissecting and elucidating the TR functions. TR are the most variable among different types of eukaryotic sequences up to species-specificity. The ways of TR fast evolution are not determined yet. The wide-spread "library" hypothesis explains the occurrence of a species-specific TRs (satellite DNAs) as a result of differential amplifications within a pool of sequences shared by genomes of related species [1]. The library concept is based on comparison of TR experimentally cloned from the species of one genus, beetles Tribolium for example. Such an approach allows to compare a limited number of TR; the sets of the TR from the genomes was never used for comparison. In silico approach allows TR sets comparison. The next generation sequencing methods and increasing number of assembled genome provide the material for the bioinformatics extracting of the nearly full set of TR in any genome. The search for the large TR lead to 62 TR's family found in mouse genome and only two of them have been known before [2]. The bioinformatics approach for search of only major TR in each genome have been published; unfortunately it has self-contradiction - major TR defined as centromeric, which is not true [3]. The aim of the current work is to compare TR sets in the genomes of closely relates species available. Our pipeline takes into consideration the basic TR characteristic: monomer length, monomers' number in the array and the monomers' degree of diversity in the array. The methods include following steps: (1) extracting the whole TR set with TRF program [4]; (2) filters applied to the TR set extracted: arrays length > 3000bp, number of monomers > 4, entropy of array > 1.76; (3) nested arrays and arrays with different monomer length with similar sequences removed (4) TR set get split into families by Blast defined similarity; (5) TR families compared with Repbase to identify the known ones; (6) the resulting TR set of


one species compared with the rest. In the current work only 5-6 TR, top of the TR representation, is shown. TRF output analysis [2, 3] was performed with custom Python scripts. It is known that in WGS and WGA datasets TR remain underrepresented (table 1, [2]). It is visible that TR amount vary ~0.1-0.01%, which is far less than the experimentally determined amount of the mouse Major Satellite (MaSat) alone (~8%) [2]. TR amount of two M.musculus genomes assemblies, already checked, are in this interval (table 1), still the tiny TR representation in dataset reflect the TR families representation in the genome [5]. Species
M u s mu sc u lu s Critu lu s g riseu s M e so cric etu s a u ra tu s Ca via p o rc ellu s Ca v ia a p p ere a M y o tis b ra n d tii M yo tis d a v id ii M y o tis lu cif u g u s Bo s in d ic u s Bo s mu tu s Bo s ta u ru s

Assembly
Mm_ Ce le ra

TR %
0,122 0,026 0,158 0,1 0,023 0,013 0,084 0,047 0,16 0,023 0,012 0,074 0,144

GRCm37
C_ griseus_ v1.0 MesA ur1.0 Cavpor3.0 Ca vA p1.0 ASM41265v1 ASM32734v1 Myoluc 2.0 Bos_ indic us_ 1.0 BosGru_ v2. 0 Bos_ taurus_ UMD_ 3.1.1 Btau_ 4.6.1

Table 1.The amount of large TR in mammalian genomes. Assembly indicated and large TR% in these assembly are shown. TR% counted as the ratio of all TR arrays sum to the total sequences length in current database.

Genus Mus was the 1st to compare due to the M.musculus genome mostly well investigated in the sense of TR content. We tried to find all the 62 M.musculus TR families [2] in raw reads of M. caroli genome (Caroli Genome Project, PRJEB2188). There are only few TR of M. musculus in M. caroli genome. M. musculus major satellite (MaSat or GSATMM) occupied nearly 0,7% of M. caroli genome, while in M. musculus genome - ~ 11 %. In M. caroli genome we found 5 other M. musculus's TR's families (table 2). The amount of 4 of them are of the same order with the exception of TR-1590-A-MM, which belongs to the class of transposable element (TE) related TR [2]. Two known as previously cloned M. caroli TR family [6] are not found in M. musculus genome [5].


Family Ma Sa t TR-1590A -MM TR-6A -MM TR-31B-MM TR-107A -MM TR-57A -MM

M u s mu scu lu s 11.3 2.24 0.25 0.22 0.04 0.004

M u s c a r o li 0.66 0. 04 0.15 0.19 0.04 0.044

Table 2. M.musculus TRs (family) representation in both genome in %. Calculation made with alignment of raw reads to TR arrays. Methods details in [5].

The TR families found in the next three genera mostly absent in Repbase. So the TR nomenclature consists of short species name and monomer length in bp (table 3).
Ca v ia p o rc e llu s Cpor-123 Cpor-783 Cpor-14 Cpor-208 Cpor-109 26 b ra n d tii Mbra -258 Mbra -17 Mbra -80-A Mbra -20 Mbra -80-B Mbra -148 133 ta u r u s Bta u-1406 Bta u-1413 Bta u-686 Bta u-48 Bta u-54 Bta u-18 65 a p p e re a Ca pp-123 Ca pp-14 Ca pp-208 Ca pp-1518 10 d a v id ii Mda v-20 Mda v-159 Mda v-41 Mda v-80 105 mu tu s Bmut-1402 Bmut-702 Bmut-18 27

Numb. TR fa mily M y o tis

Numb. TR fa mily Bo s

Numb. TR fa mily

lu c if u g u s Mluc -381 Mluc -80 Mluc -154 26 in d ic u s Bind-1406 Bta u-1211 Bind-686 Bind-18 18

Re pba se BTSAT4/BTAST5 BTSAT2/BTAST3 BTSAT6

Table 3. TR found in the assemblies indicated on table 1; in each genera the species with higher number of TR families counted as reference (1 st one); top 5-6 TR are shown. TR similar in sequence (not monomer length) placed at the same line. The TR major in amount in each genome is shown in grey. Names according to Repbase for 3 known Bos TR are shown.

Genus Cavia (guinea pig). C. porcellus genome possesses 25 TR and C. apperea ­ only 10 TR. 9 out of 10 C. apperea TR's family exist also in C. porcellus genome except the major TR for this species ­ Capp-1518. In C. porcellus genome there are two major TR ­ Cpor-783 is absent in the 2
nd

genome and Cpor-123 exists in C. apperea genome as the minor


one. Genus Myotis (bat). There is no any TR of Myotis in Repbase, but 133 TR's families are found in M. brandtii genome, 105 - in M. davidii genome and 26 - in M. lucifugus genome. Only 5 TR families exist in three genome but most of TR families are speciesspecific. Major TR for M. davidii and M. lucifugus is common in sequence though differ in monomer length, but the same TR is minor one in M. brandtii. The major for M. brandtii is not identified in both other genomes at all. Genus Bos (cow). There are three TR known for Bos in Repbase and all of them are found in all Bos assemblies. Still the major TR in all Bos assemblies differ: in B. taurus genome BTSAT4/BTSAT5 is a major TR while BTSAT6 major TR family in B. indicus genome. It is visible that most of the top TR families in genus Bos exist only in two genomes or even in one, i.e. is species-specific. The absence of assembled genome of closely related species put the limitation to the bioinformatics approach. We examined all the genomes available for this aim (table 1-3). The most exhausting analysis of major TR (one for each species) of ~300 animals and plants display no readily apparent conserved characteristics; individual clades likely differ in terms of their tendency for closely related species to have TR that share conserved sequence characteristics [3]. We compared the TR sets. Our data evidenced that there are speciesspecific top TR, which are absent in genome of closely related species. In all three genera examined major TRs are species-specific and hardly exist in other species of genera even as a minor ones. This finding makes the "library" hypothesis of TR evolution questionable. Acknowledgments. This work was supported by the Russian Foundation for Basic Research (13-04-01739-a) and grant from presidium RAS (MCB). 1. Plohl, M., Mestrovi, N., Mravinac, B. (2014). Chromosoma, 123(4), 313-325. 2. Komissarov, A. S. et al (2011). BMC genomics, 12(1), 531. 3. Melters, D. P. et al. (2013). Genome Biol, 14(1), R10. 4. Benson G. (1999) Nucleic Acids Research Vol. 27, No. 2, pp. 573-580. 5. Ostromyshenskii, D. I. et al (2015). Tsitologiya, 57(2), 102-110. 6. Kipling D et al (1995) Mol Cell Biol 15:4009­4020