Given an Extended Backus Naur Format (EBNF) description of the syntax of a language, it is relatively easy to generate code which will build parse trees from source programs. With a bit of additional annotation in the EBNF, an Abstract Syntax Tree (AST) builder can easily be generated.
For example, given the following description:
expression = term {`+' term <plus 2> | `-' term <minus 2>} term = factor {`*' factor <multiply 2> | `/' factor <divide 2>} factor = number | identifier <value 1> | `(' expression `)'
along with some additional information about numbers and identifiers, it is easy to generate a program that can turn an expression such as
a + b * (c - 1)
into the following abstract syntax tree:
It is equally easy to translate this tree into code like the following:
plus(value(identifier(`a')), multiply(value(identifier(`b')), minus(value(identifier(`c')), number(`1'))))
This code in turn, can easily be compiled and run, provided that suitable definitions exist for the function calls. Clearly, these functions can be specified in a programming language and constitute a formal definition of the semantics of the original language.
For example, modules identifier and number (written in the Slim programming language) fully specify the semantics of the expression language defined by the EBNF above. To obtain an executable version of the expression program, simply compile the textual version of the AST and link in the compiled versions of identifier and number. Details of how an actual compiler for the expression language defined above, can be constructed, appear in another section.
Expressions are particularly easy to handle. Basic control flow, such as instruction sequences, if-then-elses, and loops are almost as easy. Disguised gotos like loop exits and early returns from functions and procedures are somewhat less easy, while explicit gotos and exceptions are not easy (and perhaps better left out of programming languages).
Declarations and definitions can be treated as expressions which create objects such as constants, types, variables, functions and procedures and bind them to names in the current scope.