Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://www.mccme.ru/albio/slides/asarin.pdf
Äàòà èçìåíåíèÿ: Thu Oct 9 22:23:45 2008
Äàòà èíäåêñèðîâàíèÿ: Tue Oct 2 09:27:54 2012
Êîäèðîâêà:

Ïîèñêîâûå ñëîâà: arp 220
Modeling and studying RNA secondary structure
EugÕne Asarin LIAFA, CNRS & Univ. Paris Diderot


Credits
Co-authors, partners and teachers:
· Vassily Lyubetsky, Alexander Seliverstov (IITP) · Thierry Cachat, Tayssir Touili (LIAFA)

Sponsor:
· CNRS/RAS convention d'Èchanges EVOLVER/REVERA

Special thanks to:
· HervÈ Isambert (I. Curie)


Disclaimer
· I am not a bioinformatician (yet) · I am (still) a computer scientist with verification background
­ everything is a transition system ­ one should explore its state space in a smart way

· This talk
­ More informatics than byology+physics ­ More models than solutions ­ More questions and speculations than answers


Motivating example: Classical Attenuation Regulation


Transcription and Translation

DNA AGCTGC


Transcription and Translation
Polymerase Gene

DNA

RNA

Ribosome

Amino Acids

Transcription: DNA to RNA, done by Polymerase Translation: RNA to Amino Acids , done by Ribosome


Gene Expression
Gene DNA

RNA

Ribosome

Amino Acids

Gene expressed if Polymerase reach the Gene


Classical Attenuation Regulation
Gene DNA

RNA

Ribosome

Depends on the structure of the RNA between Ribosome and Polymerase


RNA Secondary Structure
RNA: a sequence of nucleotides A, G, C, U with links A-U and G-C

ACACUGGCU CACCUUCGGGUGGGCCUUUCUCG

UC U C C A Helix C U C G ACACUG

G G G U G A G C CUUUCUGCG


RNA Secondary Structure
(a simplified view)

This structure is dynamic and changes very fast and can cause the slippage of the Polymerase


Polymerase Slippage
T-rich region: connection of Pol and DNA weakens
Gene

RNA

Ribosome


Polymerase Slippage
T-rich region: connection of Pol and DNA weakens
Gene

RNA

Ribosome

Slippage of the polymerase and the gene is not expressed: Termination


Polymerase Slippage
T-rich region: connection of Pol and DNA weakens
Gene

RNA

Ribosome

Polymerase reaches Gene and the Gene is expressed: Antitermination

Each of these two situations can happen with some probability.


Regulation mechanism : causal chain
· · · · · · Concentration of a product (say trp) Speed of Ribosome Dynamics of secondary structure Probability of Polymerase slippage Gene expression (production of trp)


Wanted
· A model of dynamics of the RNA secondary structure capable to predict the probability of gene expression.
­ Should be quantitative ­ Should represent transient behaviours (steady state not enough) ­ Should be validated by biological data on regulation


Other motivations
· Other kinds of regulation · Other alternative behaviors/structures on a RNA, e.g. in ribozymes · Scientific curiosity · Ineresting transfers between Bio and Info


Models, Analyses, Tools


Tools (2 examples)
· Rnamodel - Lyubetsky et al. · Kinefold ­ Isambert et al.


The approach: Markov chain
· Features
­ The sequence is fixed. ­ States of the MC : all possible secondary structures on this sequence (or part of it) ­ Transitions: simple events (see below) ­ Rates: determined by E

· Difficulties
­ ­ ­ ­ As usual, many parameters are difficult to find The Markov Chain is huge and complex Only Monte-Carlo simulation is possible It is still heavy and slow


Some recipes
· · · · · · Find a succinct structured representation of MC Use on-the-fly state generation Use symbolic representations Use abstractions Use acceleration Use other advanced trchniques ­ perfect simulation etc. · Think


A succinct representation
We suggest Probabilistic Rewriting Systems


The idea?
· Represent the RNA secondary structure by a term · Represent the dynamics of the secondary structure by rewriting rules


The set of windows
Polymerase RNA Ribosome

w =(R,P) R: position of the Ribosome in the RNA P: position of the Polymerase in the RNA

W={w=(R,P) s.t. 13

R

P

l}

l: the length of the regulatory region


The helices
B C

A

D f = (A,B,C,D)


The hypohelices
F G

f = (A,B,C,D) g = (E,F,G,H)

E B

H C

A

D

f(g)


The structure as a term
B R C

A

D

P

f = (A,B,C,D) w =(R,P)

w(f)


The structure as a term
B R A C F G

D

E

H

P

f = (A,B,C,D) g = (E,F,G,H) w =(R,P) w(f , g)


The structure as a term

R

f h g
w =(R,P)

j k P

w(f , g(h,k) , j)

The order is not important


The Dynamics as a Probabilistic Rewriting System


Extension and Reduction of a helix

B

C F E

G H

A

D

f = (A,B,C,D)
Rewriting rule:

g = (E,F,G,H)
g
One rule for multipe contexts

f


(De)-Composition of a Hypohelix

f h k

j


(De)-Composition of a Helix
f h g k j

Rewriting rule:

w(f , h , k , j) w( ,h,k)

w(f , g(h , k) , j) w( , g(h , k) )

Meta Rewriting rule:


(De)-Composition of a Helix
f h k j

m


(De)-Composition of a Helix
f h g k j

m

One rule for multipe contexts

Meta Rewriting rule:

m(

,h,k)

m ( , g ( h , k) )


The Window Movement
Polymerase RNA Ribosome

w =(R,P) Movement of the Ribosome: (R , P)(x) Movement of the Polymerase: (R , P) (x)

three rules for multipe contexts

(R+3 , P) (x') (R , P+1) (x)

Termination: Slippage of the Polymerase: (R , P) (x)


Rates of the rewriting Rules
· Rates of the Markov Chain determined by E. Two terms
­ Stacking energy (easy) ­ Free energy (a bit obscure)

· Movements of Rib and Pol ­ even more obscure


Other optimisations
Implemented or not


On-the-fly (everybody does it)
· States of the MC are created only when visited. · Still needed efficient data structures for the states · Needed efficient algorithms to find all the successors of a given state (and the rates)


Concretely
repeat 1000 times s=empty window repeat find all successors s' of s and rates s->s' s= a random s' until expressed or aborted output statistics


Symbolic representation
(not used yet) A data structure (close to a formula) to represent current set of states or probability distribution.
· Very successfull in verification domain · We tried probabilistic tree automata, they explode · Thinking


Abstraction
(everybody use it in a naive way)
· Aim : to have fewer states · Idea: to group together several states · Rnamodel and Kinefold : only maximal helices stored. · We try to group some close helices: 800000->300000 states · Abstraction-based algorithms should be done systematically


Abstraction (what if)
· Biological description is very abstract (terminator, antiterminator, noise) · What if ... a model is possible at this level? · T o t h in k


Acceleration
· Problem : many fast transitions without any progress, major event rare · Partial solution: group similar states together · A better one : Isambert's clustering · T o t h in k m o re


Advanced simulation methods
· « Perfect simulation » - nice but only for steady state · Other methods from Performance Evaluation ­ mostly for N^k · To read and to think


More complex structures


Other structures
· Helices can be « flipped » not a big deal · They can be pseudoknotted · Biologically relevant


Consequences for modeling and analysis
· What are legal configurations? · How to compute free energy? · Data structure : a set of helices, a tree with shortcuts, a colored graphs? · Alphabet and state space much bigger... · Abstractions more involved. · Kinefold ­ yes, Rnamodel ­ in progress


Dreams
· To find simple abstract models · To analyze them w/o Monte-Carlo · To transfer techniques from Verification and Performance evaluation and backwards


Questions?


Continuous Markov Chain
· Probability for moving from s to s' within t time units is:

1e

( s , s ') t


Continuous Markov Chain
· Probability for moving from s to s' within t time units is:

( s, s ' ) (1 e E ( s)

E ( s )t

), E ( s ) =

( s, s ' )