Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.mccme.ru/albio/slides/asarin.pdf
Дата изменения: Thu Oct 9 22:23:45 2008
Дата индексирования: Tue Oct 2 09:27:54 2012
Кодировка:
Поисковые слова: п п п п п п п

Modeling and studying RNA secondary structure
EugХne Asarin LIAFA, CNRS & Univ. Paris Diderot

Credits
Co-authors, partners and teachers:
· Vassily Lyubetsky, Alexander Seliverstov (IITP) · Thierry Cachat, Tayssir Touili (LIAFA)

Sponsor:
· CNRS/RAS convention d'Иchanges EVOLVER/REVERA

Special thanks to:
· HervИ Isambert (I. Curie)

Disclaimer
· I am not a bioinformatician (yet) · I am (still) a computer scientist with verification background
everything is a transition system one should explore its state space in a smart way

· This talk
More informatics than byology+physics More models than solutions More questions and speculations than answers

Motivating example: Classical Attenuation Regulation

Transcription and Translation

DNA AGCTGC

Transcription and Translation
Polymerase Gene

DNA

RNA

Ribosome

Amino Acids

Transcription: DNA to RNA, done by Polymerase Translation: RNA to Amino Acids , done by Ribosome

Gene Expression
Gene DNA

RNA

Ribosome

Amino Acids

Gene expressed if Polymerase reach the Gene

Classical Attenuation Regulation
Gene DNA

RNA

Ribosome

Depends on the structure of the RNA between Ribosome and Polymerase

RNA Secondary Structure
RNA: a sequence of nucleotides A, G, C, U with links A-U and G-C

ACACUGGCU CACCUUCGGGUGGGCCUUUCUCG

UC U C C A Helix C U C G ACACUG

G G G U G A G C CUUUCUGCG

RNA Secondary Structure
(a simplified view)

This structure is dynamic and changes very fast and can cause the slippage of the Polymerase

Polymerase Slippage
T-rich region: connection of Pol and DNA weakens
Gene

RNA

Ribosome

Polymerase Slippage
T-rich region: connection of Pol and DNA weakens
Gene

RNA

Ribosome

Slippage of the polymerase and the gene is not expressed: Termination

Polymerase Slippage
T-rich region: connection of Pol and DNA weakens
Gene

RNA

Ribosome

Polymerase reaches Gene and the Gene is expressed: Antitermination

Each of these two situations can happen with some probability.

Regulation mechanism : causal chain
· · · · · · Concentration of a product (say trp) Speed of Ribosome Dynamics of secondary structure Probability of Polymerase slippage Gene expression (production of trp)

Wanted
· A model of dynamics of the RNA secondary structure capable to predict the probability of gene expression.
Should be quantitative Should represent transient behaviours (steady state not enough) Should be validated by biological data on regulation

Other motivations
· Other kinds of regulation · Other alternative behaviors/structures on a RNA, e.g. in ribozymes · Scientific curiosity · Ineresting transfers between Bio and Info

Models, Analyses, Tools

Tools (2 examples)
· Rnamodel - Lyubetsky et al. · Kinefold Isambert et al.

The approach: Markov chain
· Features
The sequence is fixed. States of the MC : all possible secondary structures on this sequence (or part of it) Transitions: simple events (see below) Rates: determined by E

· Difficulties
As usual, many parameters are difficult to find The Markov Chain is huge and complex Only Monte-Carlo simulation is possible It is still heavy and slow

Some recipes
· · · · · · Find a succinct structured representation of MC Use on-the-fly state generation Use symbolic representations Use abstractions Use acceleration Use other advanced trchniques perfect simulation etc. · Think

A succinct representation
We suggest Probabilistic Rewriting Systems

The idea?
· Represent the RNA secondary structure by a term · Represent the dynamics of the secondary structure by rewriting rules

The set of windows
Polymerase RNA Ribosome

w =(R,P) R: position of the Ribosome in the RNA P: position of the Polymerase in the RNA

W={w=(R,P) s.t. 13

R

P

l}

l: the length of the regulatory region

The helices
B C

A

D f = (A,B,C,D)

The hypohelices
F G

f = (A,B,C,D) g = (E,F,G,H)

E B

H C

A

D

f(g)

The structure as a term
B R C

A

D

P

f = (A,B,C,D) w =(R,P)

w(f)

The structure as a term
B R A C F G

D

E

H

P

f = (A,B,C,D) g = (E,F,G,H) w =(R,P) w(f , g)

The structure as a term

R

f h g
w =(R,P)

j k P

w(f , g(h,k) , j)

The order is not important

The Dynamics as a Probabilistic Rewriting System

Extension and Reduction of a helix

B

C F E

G H

A

D

f = (A,B,C,D)
Rewriting rule:

g = (E,F,G,H)
g
One rule for multipe contexts

f

(De)-Composition of a Hypohelix

f h k

j

(De)-Composition of a Helix
f h g k j

Rewriting rule:

w(f , h , k , j) w( ,h,k)

w(f , g(h , k) , j) w( , g(h , k) )

Meta Rewriting rule:

(De)-Composition of a Helix
f h k j

m

(De)-Composition of a Helix
f h g k j

m

One rule for multipe contexts

Meta Rewriting rule:

m(

,h,k)

m ( , g ( h , k) )

The Window Movement
Polymerase RNA Ribosome

w =(R,P) Movement of the Ribosome: (R , P)(x) Movement of the Polymerase: (R , P) (x)

three rules for multipe contexts

(R+3 , P) (x') (R , P+1) (x)

Termination: Slippage of the Polymerase: (R , P) (x)

Rates of the rewriting Rules
· Rates of the Markov Chain determined by E. Two terms
Stacking energy (easy) Free energy (a bit obscure)

· Movements of Rib and Pol even more obscure

Other optimisations
Implemented or not

On-the-fly (everybody does it)
· States of the MC are created only when visited. · Still needed efficient data structures for the states · Needed efficient algorithms to find all the successors of a given state (and the rates)

Concretely
repeat 1000 times s=empty window repeat find all successors s' of s and rates s->s' s= a random s' until expressed or aborted output statistics

Symbolic representation
(not used yet) A data structure (close to a formula) to represent current set of states or probability distribution.
· Very successfull in verification domain · We tried probabilistic tree automata, they explode · Thinking

Abstraction
(everybody use it in a naive way)
· Aim : to have fewer states · Idea: to group together several states · Rnamodel and Kinefold : only maximal helices stored. · We try to group some close helices: 800000->300000 states · Abstraction-based algorithms should be done systematically

Abstraction (what if)
· Biological description is very abstract (terminator, antiterminator, noise) · What if ... a model is possible at this level? · T o t h in k

Acceleration
· Problem : many fast transitions without any progress, major event rare · Partial solution: group similar states together · A better one : Isambert's clustering · T o t h in k m o re

Advanced simulation methods
· « Perfect simulation » - nice but only for steady state · Other methods from Performance Evaluation mostly for N^k · To read and to think

More complex structures

Other structures
· Helices can be « flipped » not a big deal · They can be pseudoknotted · Biologically relevant

Consequences for modeling and analysis
· What are legal configurations? · How to compute free energy? · Data structure : a set of helices, a tree with shortcuts, a colored graphs? · Alphabet and state space much bigger... · Abstractions more involved. · Kinefold yes, Rnamodel in progress

Dreams
· To find simple abstract models · To analyze them w/o Monte-Carlo · To transfer techniques from Verification and Performance evaluation and backwards

Questions?

Continuous Markov Chain
· Probability for moving from s to s' within t time units is:

1e

( s , s ') t

Continuous Markov Chain
· Probability for moving from s to s' within t time units is:

( s, s ' ) (1 e E ( s)

E ( s )t

), E ( s ) =

( s, s ' )