Compiler Theory

Language Modeling

Example Compilers

Source Language Compiler Target Language
C GCC x86
C Clang LLVM IR
LLVM IR LLVM x86
JavaScript 2016 Babel JavaScript 5
JavaScript Minifier Compressed JS

Languages in Practice versus in Theory

In practice, languages are defined by their syntaxes, semantics, and paradigms (functional vs. object-oriented vs. etc.).

In theory, languages are modelled by calculi along with set theory and logic.

Language Definition

A language is

Syntax

Language syntax is defined using a BNF grammar. This is an inductive construction.

b ::= true | false
n ::= 0 | 1 | 2 | … | 42 | … | 1000 | …
e ::= n | b | e + e | e - e | if e then e else e

An alternative is using set notation, which is avoided due to its verbosity.

b = {true, false}
n = ℕ
e = b ∪ n ∪ { e1 + e2 : e1 ∈ e, e2 ∈ e } ∪ …

Examples expressions in this languages are

  1. true
  2. 1 + 2
  3. e1 + e2 (where e1 and e2 are expressions)

Semantics

Semantics answer the question how to define eval.

eval(e) = v

Denotational Semantics

Denotational semantics interpret expressions as math, e.g.

For example, consider the semantic function 〚〛 : Expression -> SetExpression.

n〛 = ℕ
〚e1 + e2〛 = 〚e1〛+〚e2
〚e1 - e2〛 = 〚e1〛-〚e2

The evaluation function eval is then defined using the semantic function 〚〛.

eval(e) = 〚e〛

Operational Semantics

Operational semantics are defined using reduction relation (→) and conversion relation (→*), assuming some existing language.

Notation for reduction relations: expression → simpler expression, i.e. e → e’.

In the following example, n ::= 0 | add1 e.

if true then e1 else e2 → e1
if false then e1 else e2 → e2
0 + e2 → e2
add1 0 + e2 → e2

For example, 1 + 2, which is short for add1 0 + add1 (add1 0), → 0 + add1 (add1 (add1 0))add1 (add1 (add1 0)), i.e. 3.

Inference rule notation:

P₁  P₂  ...  Pₙ
———————————————
       C

where Pᵢ are premises and C is the conclusion.

Notation for reduction relations: expression →* expression, i.e. e →* e’.

For example,

————————— Reflexive
 e →* e'
 e₁ → e₂    e₂ →* e₃
————————————————————— Transitive
      e₁ →* e₃

The rest of the rules are congruence rules.

 e₁ →* e₁'    e₂ →* e₂'
———————————————————————— Congruence-Addition
  e₁ + e₂ →* e₁' + e₂'
                    e →* e'
——————————————————————————————————————————————— Congruence-If
 if e then e₁ else e₂ →* if e' then e₁ else e₂

etc.

Equivalence relation: e ≡ e’

 e →* e'
—————————
  e ≡ e'
 e ≡ e'
————————
 e' ≡ e

For example, consider 1 + 2.

                   0 + 3 → 3   3 →* 3
                  ————————————————————
 1 + 2 → 0 + 3         0 + 3 →* 3
———————————————————————————————————————
             1 + 2 → 3

Valid output/value (can put output in registers, heaps, etc. here): v ::= n | b.

The evaluation function eval is then defined as follows.

eval(e) = v | e → v

which is read as the result of evaluating e is v given that e converts to v.

Extending the language

Let’s extend the language to support pairs.

Syntax extension

e ::=| (e, e) | fst e | snd e

Semantic extensions

Reduction rules

These should be the most primitive rules/operations.

fst (e₁, e₂) → e₁
snd (e₁, e₂) → e₂

The rules for fst and snd are called elimination rules. The introduction rule for a pair is its constructor.

Conversion rules

 e₁ →* e₁'   e₂ →* e₂'
——————————————————————— Congruence-Pair
(e₁, e₂) →* (e₁', e₂')
     e →* e'
————————————————— Congruence-fst
 fst e →* fst e'
     e →* e'
————————————————— Congruence-snd
 snd e →* snd e'

Conversion rules let us work on sub-expressions.