Compiler Theory
Language Modeling
Example Compilers
| Source Language | Compiler | Target Language |
|---|---|---|
| C | GCC | x86 |
| C | Clang | LLVM IR |
| LLVM IR | LLVM | x86 |
| JavaScript 2016 | Babel | JavaScript 5 |
| JavaScript | Minifier | Compressed JS |
Languages in Practice versus in Theory
In practice, languages are defined by their syntaxes, semantics, and paradigms (functional vs. object-oriented vs. etc.).
In theory, languages are modelled by calculi along with set theory and logic.
Language Definition
A language is
- a collection of expression,
- some operations, and
- shared properties (static and dynamic guarantees).
Syntax
Language syntax is defined using a BNF grammar. This is an inductive construction.
b ::= true | false
n ::= 0 | 1 | 2 | … | 42 | … | 1000 | …
e ::= n | b | e + e | e - e | if e then e else e
An alternative is using set notation, which is avoided due to its verbosity.
b = {true, false}
n = ℕ
e = b ∪ n ∪ { e1 + e2 : e1 ∈ e, e2 ∈ e } ∪ …
Examples expressions in this languages are
true1 + 2- e1
+e2 (where e1 and e2 are expressions)
Semantics
Semantics answer the question how to define eval.
eval(e) = v
Denotational Semantics
Denotational semantics interpret expressions as math, e.g.
For example, consider the semantic function 〚〛 : Expression -> SetExpression.
〚n〛 = ℕ
〚e1 + e2〛 = 〚e1〛+〚e2〛
〚e1 - e2〛 = 〚e1〛-〚e2〛
The evaluation function eval is then defined using the semantic function 〚〛.
eval(e) = 〚e〛
Operational Semantics
Operational semantics are defined using reduction relation (→) and conversion relation (→*), assuming some existing language.
Notation for reduction relations: expression → simpler expression, i.e. e → e’.
In the following example, n ::= 0 | add1 e.
if true then e1 else e2 → e1
if false then e1 else e2 → e2
0 + e2 → e2
add1 0 + e2 → e2
For example, 1 + 2, which is short for add1 0 + add1 (add1 0), → 0 + add1 (add1 (add1 0)) → add1 (add1 (add1 0)), i.e. 3.
Inference rule notation:
P₁ P₂ ... Pₙ
———————————————
C
where Pᵢ are premises and C is the conclusion.
Notation for reduction relations: expression →* expression, i.e. e →* e’.
For example,
————————— Reflexive
e →* e'
e₁ → e₂ e₂ →* e₃
————————————————————— Transitive
e₁ →* e₃
The rest of the rules are congruence rules.
e₁ →* e₁' e₂ →* e₂'
———————————————————————— Congruence-Addition
e₁ + e₂ →* e₁' + e₂'
e →* e'
——————————————————————————————————————————————— Congruence-If
if e then e₁ else e₂ →* if e' then e₁ else e₂
etc.
Equivalence relation: e ≡ e’
e →* e'
—————————
e ≡ e'
e ≡ e'
————————
e' ≡ e
For example, consider 1 + 2.
0 + 3 → 3 3 →* 3
————————————————————
1 + 2 → 0 + 3 0 + 3 →* 3
———————————————————————————————————————
1 + 2 → 3
Valid output/value (can put output in registers, heaps, etc. here): v ::= n | b.
The evaluation function eval is then defined as follows.
eval(e) = v | e → v
which is read as the result of evaluating e is v given that e converts to v.
Extending the language
Let’s extend the language to support pairs.
Syntax extension
e ::= … | (e, e) | fst e | snd e
Semantic extensions
Reduction rules
These should be the most primitive rules/operations.
fst (e₁, e₂) → e₁
snd (e₁, e₂) → e₂
The rules for fst and snd are called elimination rules. The introduction rule for a pair is its constructor.
Conversion rules
e₁ →* e₁' e₂ →* e₂'
——————————————————————— Congruence-Pair
(e₁, e₂) →* (e₁', e₂')
e →* e'
————————————————— Congruence-fst
fst e →* fst e'
e →* e'
————————————————— Congruence-snd
snd e →* snd e'
Conversion rules let us work on sub-expressions.