Lecture 3: Grammars, Derivations, Parse Trees, Scanning, Introduction to Oz

Syntactiv Analysis of Programs

How are programs processed?

The initial input is linear—it is a sequence of symbols from the alphabet of characters.
A lexical analyzer (scanner, lexer, tokenizer) reads the sequence of characters and outputs a sequence of tokens.
A parser reads a sequence of tokens and outputs a structured (typically non-linear) internal representation of the program—a syntax tree (parse tree).
The syntax tree is further processed, e.g., by an interpreter or by a compiler.

We have seen some of these steps implemented in the mdc interpreter.

Program: if X == 1 then ...
Input: ‘i’ ‘f’ ‘ ’ ‘X’ ‘ ’ ‘=’ ‘=’ ‘ ’ ‘1’ ‘ ’ ‘t’ ‘h’ ‘e’ ‘n’ ...
Lexemization: ‘if’ ‘X’ ‘==’ ‘1’ ‘then’ ...
Tokenization: key(‘if’) var(‘X’) op(‘==’) int(1) key(‘then’) ...
Parsing: program(ifthenelse(eq(var(‘X’)
                            int(1))
                        ...
                        ...)
                ...)
Interpretation: execution according to language semantics
Compilation: code generation according to language semantics

Derivations

Following the recipe for using a grammar explained earlier, we can derive sentences in the language $L(\Gamma)$ specified by a grammar $\Gamma$ in a sequence of steps.

In each step we transform one sentential form (a sequence of terminals and/or non-terminals) into another sentential form by replacing one non-terminal with the right-hand side of a matching rule.
The first sentential form is the start variable vs alone.
The last sentential form is a valid sentence, composed only of terminals.

Rightmost and leftmost derivations

A derivation is a sequence of sentential forms beginning with a single nonterminal and ending with a (valid) sequence of terminals.

A derivation such that in each step it is the leftmost non-terminal that is replaced is called a ‘leftmost derivation’.
A derivation such that in each step it is the rightmost non-terminal that is replaced is called a ‘rightmost derivation’.
There can be derivations that are neither leftmost nor rightmost.

Syntax Trees

A parse tree (a syntax tree) is a structured representation of a program.

Parse trees are generated in the process of parsing programs.
A parser is a function (a program) that takes as input a sequence of tokens and returns a nested data structure corresponding to a parse tree.

The data structure returned by the parser is an internal (intermediate) representation of the program. A parse tree can be used to:

interpret the program (in interpreted languages);
generate target code (in compiled languages);
optimize the intermediate code (in both interpreted and compiled languages);
analyze the intermediate code, e.g., perform static analysis or compute code metrics (in both interpreted and compiled languages).

Ambiguity

A grammar is ambiguous if a sentence can be parsed in more than one way the program has more than one parse tree.

tdt4165/lectures

Table of Contents

Lecture 3: Grammars, Derivations, Parse Trees, Scanning, Introduction to Oz

Syntactiv Analysis of Programs

How are programs processed?

Derivations

Rightmost and leftmost derivations

Syntax Trees

Ambiguity

Avoiding Abiguity

Scanning

Introduction to Oz

The Declarative Model of Computation

Kernel Language-Based Semantics

Syntax of the Declarative Kernel Language