Документ взят из кэша поисковой машины. Адрес оригинального документа : http://mccmb.belozersky.msu.ru/2013/abstracts/abstracts/127.pdf
Дата изменения: Thu Mar 21 18:07:14 2013
Дата индексирования: Thu Feb 27 20:56:07 2014
Кодировка:
From ChIP-Seq data to improved transcription factor binding sites models Ivan V. Kulakovskiy1,2, Victor G. Levitsky3,4, Dmitry G. Oshchepkov3, Ilya E. Vorontsov2,5, Vsevo lod J. Makeev1
1

,2,6,7

Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov str. 32, Moscow, 119991, GSP-1, Russia

2

Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina str. 3, Moscow, 119991, Russia

3

Laboratory of Molecular Genetics Systems, Institute of Cytology and Genetics, Siberian Division of Russian Academy of Sciences, Lavrentiev Prospect 6, Novosibirsk, 630090, Russia
4

Faculty of Natural Sciences, Novosibirsk State University, Pirogova str. 2, Novosibirsk, 630090, Russia

5

Yandex Data Analysis School, Data Analysis Department, Moscow Institute of Physics and Technology, Leo Tolstoy Str. 16, Moscow, 119021, Russia
6

State Research Institute of Genetics and Selection of Industrial Microorganisms, 1st Dorozhny proezd, 1, Moscow, 117545, Russia ivan.kulakovskiy@gmail.com

7

Moscow Institute of Physics and Technology, Institutskii per. 9, Dolgoprudny, 141700, Moscow Region, Russia

Sequence motif analysis is one o f the base co mponents of transcript ional regulat ion studies in higher eukaryotes. In particular, motif finding methods are utilized to predict putative transcript ion factor binding sites (TFBS) in geno mic regions. This requires a TFBS model, which is often represented as a posit ional weight matrix (PWM) based on a gapless mult iple local alignment of experimentally identified TFBS sequences. Exist ing tools for TFBS predict ion still mainly rely on PWMs based on nucleotide posit ional frequencies. Our PWM-based algorithm ChIPMunk [1] was able to successfully co mpete with other tools in several independent benchmarks including recent study by DREAM consortium [2]. Modern high-throughput methods, including chromat in immunoprecipitat ion fo llowed by deep sequencing, ChIP-Seq, provide large amounts of data which can be utilized for more advanced models accounting for posit ional dependencies in TFBS. We present our new tool, diChIPMunk [3], http://autosome.ru/dichipmunk/, that can produce dinucleotide PWMs taking into account dependencies between neighboring nucleotides in TFBS. Using several public ChIP-Seq datasets we show that dinucleotide PWMs produced by


diChIPMunk clearly outperform exist ing published PWMs and novel PWMs constructed by ChIPMunk fro m the same data.

This work was supported by a Dynast y Foundat ion Fellowship [to I.V.K.]; Russian Foundat ion for Basic Research [12-04-32082 to I.V.K.] and [12-04-01736-a to D.G.O.]. 1. I.V. Kulakovskiy et al. (2010) Deep and wide digging for binding motifs in ChIP -Seq data, Bioinformatics, 26(20):2622-3. 2. M.T. Weirauch et al. (2013) Evaluat ion of methods for modeling transcript ion factor sequence specificit y, Nat Biotechnol, 31(2):126-34. 3. I. Kulakovskiy et al. (2013) From binding mot ifs in chip-seq data to improved models of transcript ion factor binding sites, J Bioinform Comput Biol, 11(1):1340004.