Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://sp.cs.msu.ru/dvm/dvmhtm1107/eng/sys/cdvm/cdvmDDe.html
Дата изменения: Mon Feb 13 12:59:06 2006 Дата индексирования: Mon Oct 1 23:16:04 2012 Кодировка: |
C-DVM compiler |
Contents
2 The Structure of the Compiler
2.1 Data Structures of the Compiler
2.2 The Linked Memory
2.3 The Scanner and the String Memory
2.4 Tokens and their symbolic names
2.4.1 Terminals
2.4.2 Delimiters and operators
2.4.3 Keywords
2.4.4 C-DVM keywords and identifiers
2.6.1 Terminals, lists and braces
2.6.2 Declarations (of C)
2.6.3 Statements (of C)
2.6.4 Expressions (of C)
2.6.5 Directives (of C-DVM)
2.6.6 Clauses and subdirectives (of C-DVM)
3 Compilation of C-DVM constructs
3.1.1 DISTRIBUTE directive
3.1.2 GENBLOCK distribution format
3.1.3 ONTO clause
3.1.4 REDISTRIBUTE directive
3.1.5 ALIGN directive
3.1.6 REALIGN directive
3.1.7 TEMPLATE clause
3.1.8 CREATE_TEMPLATE directive
3.2.1 PARALLEL directive
3.2.2 ACROSS clause
3.2.3 PROCESSORS directive and NUMBER_OF_PROCESSORS() function
3.2.4 TASK directive
3.2.5 MAP directive
3.2.6 TASK_REGION directive
3.2.7 ON-block construct
3.2.8 ON-loop construct
3.3.1 SHADOW clause
3.3.2 SHADOW_RENEW clause
3.3.3 SHADOW_GROUP directive
3.3.4 CREATE_SHADOW_GROUP directive
3.3.5 SHADOW_START directive
3.3.6 SHADOW_START clause
3.3.7 SHADOW_WAIT directive
3.3.8 SHADOW_WAIT clause
3.4.1 REMOTE_ACCESS directive and clause
3.4.2 REMOTE_GROUP directive
3.4.3 PREFETCH directive
3.4.4 RESET directive
3.4.5 Remote references
3.5.1 REDUCTION_GROUP directive
3.5.2 REDUCTION clause
3.5.3 Reduction variables and operations
3.5.4 REDUCTION_START directive
3.5.5 REDUCTION_WAIT directive
3.6.1 Creation and deletion of distributed arrays
3.6.2 Static distributed arrays
3.6.3 References to distributed data
3.6.4 Own computation
3.6.5 Initialization and completion of parallel execution
3.6.6 Input-output functions
3.7.1 Performance analyzer. Loops
3.7.2 Performance analyzer. INTERVAL directive
3.7.3 Debugger. Data tracing
3.7.4 Debugger. Computation tracing
3.7.5 Sequential code
C-DVM is the C language extended by special annotations for specifying parallel execution of a program. These annotations are called DVM-directives. C-DVM compiler translates an annotated C-DVM program to a SPMD stile program that contains calls to Run-Time Library (RTL).
Besides "pure" parallel code the compiler can produce an "extended" debugging code to use features of the performance analyzer (PPPA) and debugger, and also a "sequential" code (i.e. without RTL calls) with such debugging extensions.
For portability the compiler is written in ANSI-C and runs in command line mode (under OS UNIX and WINDOWS) using the following command:
c_dvm [ options ] [ -o outfile ] infile
where:
infile - an obligatory parameter; a source C-DVM file; it is suppoused that a program is valid C program, compilied by any usual C compiler.
-o outfile -- an optional parameter: the name of output file with converted program; it is cdvm_out.c by default;
options - one or more from the following options:
-s - produce sequential program (no DVM, trace or statistics only); a parallel program is generated by default;
tracing modes:
-d1 - tracing distirbuted data updates,
-d2 - tracing all accesses to distributed
data,
-d3 - tracing all data updates,
-d4 - tracing all accesses to all data.
trace accumulation modes:
-e1 - parallel loops and surrounding
sequential loops,
-e2 - parallel loops and user defined INTERVALs;
-e3 - e1 + e2,
-e4 - e3 + all sequential loops.
others:
-v - Verbose mode, output messages to the
screen,
-w - enable all Warnings,
-w- - disable all Warnings,
-xNNN - eXtent of internal tables (-x16
by default).
2 The Structure of the Compiler
The main program of the Compiler performs the following steps:
2.1 Data Structures of the Compiler
Compilation is made upon an internal representation of a program.The main data structures used for the internal representation are the following:
Sizes of these structures determines the maximal size of a source program. If the size is not enough it can be "scaled" by command line parameter -xNNN.
The hash tables are accessed by the function HTfind(item,len,ht,ht_cmp,ht_new). It searches an item of length len in a hash table ht using ht_cmp function to compare items and ht_new function to write new item to the table. It returns the cell number in the hash table where the item is kept. The cell contents is interpreted by a calling function and ht_cmp and ht_new functions.
The linked memory is used to store the following data:
Each node of the linked memory has the following four fields:
The linked memory is implemented by the following functions:
The attribute list is supported by the functions:
Semantic stack is implemented as a B-linked list in the linked memory with the root Stack and functions SSpush and SSpop. There are functions walk(N,sfun) and walkR(N) which implement standard walk throught the parsing tree invoking a given semantic function sfun for every visited node.
To handle declarations a stack of visibility scopes (as a list in the linked memory) and lists of currently defined variables for every scope are built. These structures are supported by the following functions:
2.3 The Scanner and the String Memory
The main scanner function is scan() function. On every invocation it:
LXcode is the code of the current token, namely: for C and C-DVM keywords and C delimiters it is their internal number; and for identifiers or constants it is their lexical category number. In the latter case LXvalue is an internal number of the identifier or constant, i.e. a reference to a node with the code TOKEN. String representation of identifiers and constants are kept in the string memory and used in generation step. For parser they are represented by their internal numbers. Internal number is assigned to an identifier at its first occurence. The scanner looks for the current token in the token table (hash function is used), and adds new token to it if necessary.
Other scanner functions:
2.4 Tokens and their symbolic names
2.4.1 Terminals
LXEOF "LXEOF" -- end of file TOKEN "TOKEN" -- token in the input stream DLM "DLM" -- delimiter in the input stream KWD "KWD" -- keyword in the input stream LXX "LXX" -- undelimited list item and tail LXI "IDENT" -- identifier LXN "NUMBER" -- numeric constant LXC "LXC" -- character constant LXR "REAL" -- floating point constant LXS "STRING" -- string constant CPP "CPP" -- preprocessor directive as string
2.4.2 Delimiters and operators
SEMIC ";" COMMA "," LBA "(" LBS "{" LBK "[" RBA ")" RBK "]" RBS "}" POINT "." QUEST "?" COLON ":" DIV "/" ADIV "/=" COMM "/*" -- C comment beginning CPPCOMM "//" -- C++ comment beginning MOD "%" AMOD "%=" ADD "+" AADD "+=" INC "++" SUB "-" ASUB "-=" DEC "--" ARROW "->" MUL "*" AMUL "*=" ASSIGN "=" EQU "==" LT "<" LE "<=" LSH "<<" ALSH "<<=" GT ">" GE ">=" RSH ">>" ARSH ">>=" BNOT "~" NOT "!" NEQ "!=" BAND "&" AAND "&=" AND "&&" BOR "|" AOR "|=" OR "||" BXOR "^" AXOR "^="
2.4.3 Keywords
RETURN "return" IF "if" ELSE "else" WHILE "while" DO "do" FOR "for" BREAK "break" CONTINUE "continue" SWITCH "switch" CASE "case" DEFAULT "default" GOTO "goto" SIZEOF "sizeof" TYPEDEF "typedef" EXTERN "extern" STATIC "static" REGISTER "register" AUTO "auto" CONST "const" INT "int" SHORT "short" LONG "long" VOID "void" CHAR "char" SIGNED "signed" UNSIGNED "unsigned" FLOAT "float" DOUBLE "double" STRUCT "struct" UNION "union" ENUM "enum"
2.4.4 C-DVM keywords and identifiers
LX_DVM "DVM" PROCESSORS "PROCESSORS" DISTRIBUTE "DISTRIBUTE" BLOCK "BLOCK" GENBLOCK "GENBLOCK" ONTO "ONTO" ALIGN "ALIGN" WITH "WITH" TEMPLATE "TEMPLATE" SHADOW "SHADOW" LX_SG "SHADOW_GROUP" LX_RG "REDUCTION_GROUP" LX_RMG "REMOTE_GROUP" TASK "TASK" REDISTRIBUTE "REDISTRIBUTE" REALIGN "REALIGN" NEW "NEW" LX_CRTEMP "CREATE_TEMPLATE" PARALLEL "PARALLEL" ON "ON" REDUCTION "REDUCTION" SHRENEW "SHADOW_RENEW" ACROSS "ACROSS" LX_CRSG "CREATE_SHADOW_GROUP" CORNER "CORNER" SHSTART "SHADOW_START" SHWAIT "SHADOW_WAIT" SUM "SUM" PROD "PRODUCT" MAX "MAX" MIN "MIN" LX_OR "OR" LX_AND "AND" MAXLOC "MAXLOC" MINLOC "MINLOC" RSTART "REDUCTION_START" RWAIT "REDUCTION_WAIT" REMOTE "REMOTE_ACCESS" INDIRECT "INDIRECT_ACCESS" PREFETCH "PREFETCH" RESET "RESET" MAP "MAP" LX_TASKREG "TASK_REGION" INTERVAL "INTERVAL" DEBUG "DEBUG" LX_NUMBER_OF_PROC "NUMBER_OF_PROCESSORS" LX_FOR "FOR" LX_DO "DO" LX_main "main" LX_malloc "malloc" LX_free "free" optD0 "d0" optD1 "d1" optD2 "d2" optD3 "d3" optD4 "d4" optE0 "e0" optE1 "e1" optE2 "e2" optE3 "e3" optE4 "e4"
Purpose of the parser is evident:
The method of parsing is straitforward top-down descent with backtracking in not numerous doubtful cases. The structure of resulting tree is described in the following section.
The parser is implemented as a recursive function Parse, which uses an internal representation of the source syntax stored in the linked memory as a syntax tree. The syntax tree is built by the function STXinit using functions:
The parser also uses the following functions:
2.6 The parsing tree structure
The following table describes the structure of the parsing tree in such a manner. For all node codes (on the left) a fragment of source code where A and B denote subconstructs corresponding to subtrees, hanging on the fields A and B respectively, is presented. AB referes to the subtree hanging on the field A of the node hanging on the field B of the initial node. Where partitioning of a source code to subconstructs is not clear it is commented. Here only "free" structure is described, i.e. dependences and constrains following from syntax are not mentioned.
2.6.1 Terminals, lists and braces
TOKEN: -- A is the position in the string memory DLM: -- A is the internal code of delimiter KWD: -- A is the internal code of keyword LXI: -- A is reference to a token node LXN: -- A is reference to a token node LXC: -- A is reference to a token node LXR: -- A is reference to a token node LXS: -- A is reference to a token node CPP: -- A is preprocessor directive as a string COMMA: A , B -- comma-list LXX: A [B-tail] -- B-list XXlist: A [B-tail] -- B-list XXslist: A [B-tail] -- B-list LBA: "(" [A] ")" LBS: "{" [A ] "}" -- compound statement LBK: [A] "[" [B] "]" -- indexing
2.6.2 Declarations (of C)
LXX: [AA-DVM_directive] BA-decl [ B-next ] XXdecl: [AA-storage_class] BA-type B-declarator-list XXtype: [AA-storage_class] BA-type B-abstract-declarator
Memory class (one or more B-linked nodes):
STATIC: static [B] AUTO: auto [B] REGISTER: register [B] EXTERN: extern [B] TYPEDEF: typedef [B] CONST: const [B]
Types (simple types are B-linked):
STRUCT: struct [A] ["{" AB-fields "}"] UNION: union [A] ["{" AB-fields "}"] ENUM: enum [A] ["{" B "}"] VOID: void TYPEDEF: B - typedef-name UNSIGNED: unsigned [B] SIGNED: signed [B] CHAR: char [B] SHORT: short [B] LONG: long [B] INT: int [B] FLOAT: float [B] DOUBLE: double [B]
Declarators:
0: AA-declarator [BA-initializer] B-list-tail LXaster: * [B-"const"] A LXfun: A "(" [B] ")"
Initializers:
ASS: = A COLON: : A
Declarator list tail:
XXbody: "{" [A] "}" [B -- ";"] XXclist: , A SEMIC: ; -- end of list
2.6.3 Statements (of C)
XXoper: A IF: if "(" AA ")" BA [else B] WHILE: while "(" A ")" B DO: do B while "(" A ")" ; FOR: for "(" AA ; ABA ; BBA ")" B SWITCH: switch "(" A ")" "{" B "}" GOTO: goto A ; LXskip: ; RETURN: return [A] ; LX_FOR: FOR "(" AA , BA ")" B LX_DO: DO "(" AA-variable , BA ")" B CONTINUE: continue ; BREAK: break ; COLON: A : XXexpr: A ; DEFAULT: default: [B] CASE: case A : [B]
2.6.4 Expressions (of C)
LBK: A "[" B "]" POINT: A . B ARROW: A -> B A SS: A = B AADD: A += B ASUB: A -= B AMUL: A *= B ADIV: A /= B AMOD: A %= B ARSH: A >>= B ALSH: A <<= B AAND: A &= B AXOR: A ^= B AOR: A |= B QUEST: A ? AB : BB OR: A || B AND: A && B BOR: A | B BXOR: A ^ B BAND: A & B EQU: A == B NEQ: A != B GT: A > B GE: A >= B LT: A < B LE: A <= B RSH: A >> B LSH: A << B ADD: [A] + B SUB: [A] - B MUL: A * B DIV: A / B MOD: A % B LXinca: A ++ LXdeca: A -- NOT: ! B BNOT: ~ B SUB: - B ADD: + B INC: ++ B DEC: -- B SIZEOF: sizeof ( B ) LXaddr: & B LXcont: * B LXfun: A "(" [B] ")" LXcast: "(" A ")" B
2.6.5 Directives (of C-DVM)
LX_DVM: DVM "(" [B-"*"] A ")"
DISTRIBUTE: DISTRIBUTE [ AA [ ONTO BA ] ] [ B-subdirs ] ALIGN: ALIGN [ AA WITH BA ] [ B-subdirs ] LX_SG: SHADOW_GROUP LX_RG: REDUCTION_GROUP LX_RMG: REMOTE_GROUP TASK: TASK PROCESSORS: PROCESSORS REDISTRIBUTE: REDISTRIBUTE AA [ ONTO BA ] [ B-"NEW" ] REALIGN: REALIGN AA WITH BA [ B-"NEW" ] LX_CRTEMP: CREATE_TEMPLATE A MAP: MAP A ONTO B LX_CRSG: CREATE_SHADOW_GROUP A ":" B-Renewees SHSTART: SHADOW_START A SHWAIT: SHADOW_WAIT A RSTART: REDUCTION_START A RWAIT: REDUCTION_WAIT A PREFETCH: PREFETCH A RESET: RESET A PARALLEL: PARALLEL AA ON BA [ B-clauses ] REMOTE: REMOTE_ACCESS [ A ":" ] B LX_TASKREG TASK_REGION A [ B-clause ] ON: ON A INTERVAL: INTERVAL [ A ] DEBUG: DEBUG A-number B-Debug_modes Debug_modes are B-linked nodes:
optD0: -d0 [ B ] optD1: -d1 [ B ] optD2: -d2 [ B ] optD3: -d3 [ B ] optD4: -d4 [ B ] optE0: -e0 [ B ] optE1: -e1 [ B ] optE2: -e2 [ B ] optE3: -e3 [ B ] optE4: -e4 [ B ] Renewees subtree is an LXX-list of:
DVMshad: A-DVMshw [ B-"CORNER" ] DVMshw: [ A ] "[" AB [ ":" BB ] "]"
2.6.6 Clauses and subdirectives (of C-DVM)
TEMPLATE: ";" TEMPLATE A SHADOW: ";" SHADOW A-DVMshw SHRENEW: ";" SHADOW_RENEW B-Renewees SHSTART: ";" SHADOW_START A SHWAIT: ";" SHADOW_WAIT A ACROSS: ";" ACROSS B-Renewees REDUCTION: ";" REDUCTION [ A ":" ] B-Red_ops REMOTE: ":" REMOTE_ACCESS [ A ":" ] B Red_ops subtree is an LXX-list of:
SUM: SUM "(" A ")" PROD: PROD "(" A ")" MAX: MAX "(" A ")" MIN: MIN "(" A ")" LX_AND: AND "(" A ")" LX_OR: OR "(" A ")" MAXLOC: MAXLOC "(" A, B ")" MINLOC: MINLOC "(" A, B ")"
The tree transformation is performed in two steps:
The function UnPars traverses the tree and restores external text representation of the program. As this file is only temporary, formatting is relatively simple. Note that output program is written in terms of C-DVM macros, but not directly in terms of RTL calls. The final form of the macros may be found in the file cdvm_c.h.