Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.mccme.ru/albio/slides/makeev.pdf
Дата изменения: Fri Oct 10 01:14:17 2008
Дата индексирования: Tue Oct 2 09:28:11 2012
Кодировка:
State Research Center of Genetics and Selection of Industrial Microorganisms, GosNIIGenetika, Moscow, Russia

Some questions of interpretation of results for DNA-protein binding on tiling arrays

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


ChIP-chip technology

From: http://www.tigr.org
October 9, 2008 3rd workshop on algorithms in Molecular Biology, Moscow, 2008

/


Genome-wide location analysis at tiling arrays

From: http://www.nimblegen.com/

NimbleGen Affymetrix Agilent

385,000 50- to 75mer 6 *10
6

RNA polymerase
Nature 436: 876­880 (2005)

25mer

Estrogen receptor
Nat Genet 38: 1289­1297 (2006)

244,000 60-mer

Polycomb
Cell 125: 301­313 (2006)

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Problem of data quality

· Mishybridization with mismatches ­> "genome-wide" · Hybridization signal depends on the CG content of a probe...
... and of the test DNA fragment

· Length distribution of DNA fragments after sonication

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Correlation in binding to probes neighboring in the genome

C(d)

Chr21 data

Distance, b.p
October 9, 2008

d

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Comparison with bioinformatics

· Sp1 ChIP at Affimetrix
­ human chromosomes 21, 22; 25+5 chip, PM, MM, probes, with two control hybridizations (input DNA and anti-GST)

· TRANSFAC contains many Sp1 binding sites

· Compare ChIP-chip with bioinformatics Sp1 transcription factor binding site predictions
October 9, 2008 3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Regions predicted by ChIP-chip

PM MM

MM ­ mismatch probe ­ mishybridisation from other DNA segments Input ­ DNA without antibody extraction step Window ­ with statistically prevalent PM ­ usually ~ 1000 bp

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Experiments with isolated Sp1 computational hits
Window 1200 bp. no hits 1200 bp isolated hits

50 bp

Probes Number Histograms

200 bp

500 bp

S/N ChIP

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


ChIP-chip signal indicate not individual sites but site clusters!

Distribution of intensities in 500 bp window is almost identical for no-PWM-hits, and one-PWM-hit windows, but it is visibly shifted to the left for 5-PWM-hits window.
October 9, 2008 3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Conclusions I · ChIP-chip is a weak filter, concentrating binding regions (up to 30 folds by our evaluation) · The noise of ChIP-chip is very high · If one takes 1000 bp windows only about 5% of high-scoring computational Sp1 sites in chromosomes 21 and 22 is covered
· (Cawley etc. Cell, 2004)

· 50% of ChIP-chip binding regions published by Affimetrix do not contain any signal recognizable with bioinformatics · Regions identified as ChIP-chip are more likely not individual binding sites but clusters of binding sites.

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Testground: identification of Sp1 binding motif

Key points: ChIP-chip regions are long ­ and contain binding sites for many different proteins -> direct identification by bioinformatics is impossible SELEX ­ give some idea of binding motif, usually distorted. But it is shows binding to the test protein Footprint ­ also can contain mistakes, but can be used as a control, being independent from ChIP-chip and SELEX

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Test set Sp1: obtaining clean data
Transfac SP1
Transfac entry Transfac entry Transfac entry
............................................................................................

629 sites total sequences lengths from 5 to 98 (22 average)

Transfac entry

Transfac entry Footprinted sequence Nearest gene

5000bp Chromosome

5000bp

filtering ambiguous entries

Using TRANSFAC as base data source for binding sites of a selected factor

Chromosome extracting chromosome region, containing footprinted sequence
Flank Footprinted sequence
Chromosome region

Flank

Dataset
Flank Flank Flank Flank F F F F ootpr ootpr ootpr ootpr inted inted inted inted s s s s equenc equenc equenc equenc e e e e Flank Flank Flank Flank

233 sites total sequences lengths from 9 to 60 (25 average)

small-BiSMark database engine

October 9, 2008

3rd workshop on algorithms in Molecular Biology, Moscow, 2008


Acknowledgments
- Vsevolod Makeev - Andreas Heinzel
- Alexander Favorov

<- From technical university Hagenberg, Austria -> Now at Universite Polytechniques, Palaiso, France

- Valentina Boeva - Ivan Kulakovsky - D m itry M a lk o

Financial support Russian Federation State Innovation Project, Russian Foundation of Basic Research, INTAS, Program in Molecular and Cellular biology, Russian Academy of Sciences Special thanks to BioBase GmBH
October 9, 2008 3rd workshop on algorithms in Molecular Biology, Moscow, 2008