Lecture Notes

Overview

From source code to an executable program, there is

Lexical analysis (scanning)

Regular languages can be recognized by finite automata (state machines) and reg.ex. (regular expressions). Thus, we have three representations:

  1. a graph representation of an automata where nodes are states and edges are transistions. We have a single starting state and one or more accepting states. Transitions are marked with sets of single characters that they apply to. Useful for visualizations and optimizations.
  2. a table representation of an automata. Useful for writing programs that do what the graph does.
  3. a regular expression representation. Useful for generating automaton programs automatically.

Regular expressions are defined by

We can prove that an NFA (Nondeterministic Finite Automata) can be constructed from any regular expression by constructing NFA's for each of the five preceding cases. This is the McNaughton-Thompson-Yamada algorithm.

NFA's admit multiple transitions on the same character, and transitions on the empty string Ο΅\epsilon. A deterministic FA does not.

Closures are the outcome of repeating a rule until the result stops changing (possibly never).

Subset construction is used to transform a NFA to a DFA. We want to group equivalent states (Ο΅\epsilon-closures) together. move(S,c)\text{move}(S,c) is the set of states that you can reach from SS when the input character is cc.

  1. Number all states
  2. Write the transition table, with move(S,c)\text{move}(S,c) for each visited state SS and possible input cc, starting from the initial state
  3. Build the DFA according to the transition table. Any Ο΅\epsilon-closure that contains an original accepting state, is an accepting state in the DFA.

Systematic minimization is used to optimize a DFA.

  1. Start with two groups: all non-final states and all final states.
  2. Within a group, check pairs (or subsets?) for equivalence. If found, these are separated into their own group.
  3. In the end, the equivalence groups may be merged to create a minimized DFA

To summarize, the McNaughton-Thompson-Yamada algorithm translates a regular expression to an NFA. Subset Construction translates an NFA to a DFA. Systematic minimization optimizes a DFA, resulting in one with minimal number of states. This is performed in problem set 1 of the course.

// TODO: Lex

Syntactical analysis (parsing)

Grammars and production rules (EBNF)

A grammer is ambiguous when it admits several syntax trees for the same statement. These are of no use to us, they must be fixed. This can be solved by altering the language or assign priorities to the productions.

Left factoring and left recursion

Left factoring shortens the distance to the next nonterminal

E.g. from

To

Left recursion elimination shifts a nonterminal to the right

E.g. from

To

Top-down parsing and LL(1) parser construction

The β€œLL” in LL(1) is

Recursive descent means we follow the children of a tree node through to the bottom, where there must be a terminal. Backtrack, and repeat. When there is choice, utilize the lookahead symbol.

This requires that the grammar is suitable, but we can adapt them somewhat (left factoring, left recursion elimination)

We find

Bottom-up parsing and LR(0) parser construction

Bottom-up parsing buffers input until it can build productions on top of productions.

Key ingredients:

The LR(0) automaton

To make the LR(0) automaton, start with the designated start item

  1. Find its closure, make a state
  2. Follow all the transitions
  3. Repeat from 1, until you reach the reduction X’ β†’ X at the other end.

Syntax Directed Translation and attributes

Semantic actions can be attached to grammar productions, and executed while parsing. They may for example derive or synthesize symbol attributes. In a syntax tree representation, inherited attributes come from above, synthesized attributes come from below. L-attributed grammars allow synthesized attributes, and inheritance from the left. Top-down grammars support L-attribution. In S-attribution, all attributes are synthesized. Bottom-up grammars support S-attribution.

Semantics

Painting with broad strokes, we have

Type checking and type judgments (TODO, slides 14-15)

So, what’s a type judgment?

Proof Tree

Three-address code (TAC) (Slides 16)

Simple CPU design, X86_64 Assembly language, the run-time stack and function calls (slides 17-18, 20-21)

The basic x86 approach

Text segment, a function
########################

_factorial:
 (setup stackframe)
 (copy args)
 (compute)
 (remove stackframe)
 (return result)
x86 Example
########################

.globl main
.section .data
hello:
 .string "Hello, world! %ld\n"
.section .text
main:
 pushq %rbp
 movq %rsp, %rbp
 movq $42, %rsi
 movq $hello, %rdi
 call printf
 leave
 ret
An activation record, on stack
########################

..
Next call's local variables
My frame ptr.
------------------- <
Return address
Arguments
(Intermediate data)
Local variables
Caller's frame ptr.
------------------- <
Return address
Arguments
..

(Simple) Objects (slides 19)

Introduction to optimizations (slides 22)

Dataflow Analysis Framework (slides 27, 28, 29)

Analysis Domain Direction Meet Op.
LV Variables backward union
CP Pairs of vars forward intersection
AE Expressions forward intersection
RD Assignments forward unions
CF Variable "constant-ness" forward meet CF

More optimization

Loop detection

Instruction selection

Register allocation