Compiler Theory

Language Modeling

Example Compilers

Source Language	Compiler	Target Language
C	GCC	x86
C	Clang	LLVM IR
LLVM IR	LLVM	x86
JavaScript 2016	Babel	JavaScript 5
JavaScript	Minifier	Compressed JS

Languages in Practice versus in Theory

In practice, languages are defined by their syntaxes, semantics, and paradigms (functional vs. object-oriented vs. etc.).

In theory, languages are modelled by calculi along with set theory and logic.

Language Definition

A language is

a collection of expression,
some operations, and
shared properties (static and dynamic guarantees).

Syntax

Language syntax is defined using a BNF grammar. This is an inductive construction.

b ::= true | false
n ::= 0 | 1 | 2 | … | 42 | … | 1000 | …
e ::= n | b | e + e | e - e | if e then e else e

An alternative is using set notation, which is avoided due to its verbosity.

b = {true, false}
n = ℕ
e = b ∪ n ∪ { e₁ + e₂ : e₁ ∈ e, e₂ ∈ e } ∪ …

Examples expressions in this languages are

true
1 + 2
e₁ + e₂ (where e₁ and e₂ are expressions)

Semantics

Semantics answer the question how to define eval.

eval(e) = v

Denotational Semantics

Denotational semantics interpret expressions as math, e.g.

set theory
category theory
domain theory
logics (first-order logic, CTL, LTL, etc.)

For example, consider the semantic function 〚〛 : Expression -> SetExpression.

〚n〛 = ℕ
〚e₁ + e₂〛 = 〚e₁〛+〚e₂〛
〚e₁ - e₂〛 = 〚e₁〛-〚e₂〛

The evaluation function eval is then defined using the semantic function 〚〛.

eval(e) = 〚e〛

Operational Semantics

Operational semantics are defined using reduction relation (→) and conversion relation (→*), assuming some existing language.

Notation for reduction relations: expression → simpler expression, i.e. e → e’.

In the following example, n ::= 0 | add1 e.

if true then e₁ else e₂ → e₁
if false then e₁ else e₂ → e₂
0 + e₂ → e₂
add1 0 + e₂ → e₂

For example, 1 + 2, which is short for add1 0 + add1 (add1 0), → 0 + add1 (add1 (add1 0)) → add1 (add1 (add1 0)), i.e. 3.

Inference rule notation:

P₁  P₂  ...  Pₙ
———————————————
       C

where Pᵢ are premises and C is the conclusion.

Notation for reduction relations: expression →* expression, i.e. e →* e’.

For example,

————————— Reflexive
 e →* e'

 e₁ → e₂    e₂ →* e₃
————————————————————— Transitive
      e₁ →* e₃

The rest of the rules are congruence rules.

 e₁ →* e₁'    e₂ →* e₂'
———————————————————————— Congruence-Addition
  e₁ + e₂ →* e₁' + e₂'

                    e →* e'
——————————————————————————————————————————————— Congruence-If
 if e then e₁ else e₂ →* if e' then e₁ else e₂

etc.

Equivalence relation: e ≡ e’

 e →* e'
—————————
  e ≡ e'

 e ≡ e'
————————
 e' ≡ e

For example, consider 1 + 2.

                   0 + 3 → 3   3 →* 3
                  ————————————————————
 1 + 2 → 0 + 3         0 + 3 →* 3
———————————————————————————————————————
             1 + 2 → 3

Valid output/value (can put output in registers, heaps, etc. here): v ::= n | b.

The evaluation function eval is then defined as follows.

eval(e) = v | e → v

which is read as the result of evaluating e is v given that e converts to v.

Extending the language

Let’s extend the language to support pairs.

Syntax extension

e ::= … | (e, e) | fst e | snd e

Semantic extensions

Reduction rules

These should be the most primitive rules/operations.

fst (e₁, e₂) → e₁

snd (e₁, e₂) → e₂

The rules for fst and snd are called elimination rules. The introduction rule for a pair is its constructor.

Conversion rules

 e₁ →* e₁'   e₂ →* e₂'
——————————————————————— Congruence-Pair
(e₁, e₂) →* (e₁', e₂')

     e →* e'
————————————————— Congruence-fst
 fst e →* fst e'

     e →* e'
————————————————— Congruence-snd
 snd e →* snd e'

Conversion rules let us work on sub-expressions.