Compiler Theory
Language Modeling
Example Compilers
Source Language | Compiler | Target Language |
---|---|---|
C | GCC | x86 |
C | Clang | LLVM IR |
LLVM IR | LLVM | x86 |
JavaScript 2016 | Babel | JavaScript 5 |
JavaScript | Minifier | Compressed JS |
Languages in Practice versus in Theory
In practice, languages are defined by their syntaxes, semantics, and paradigms (functional vs. object-oriented vs. etc.).
In theory, languages are modelled by calculi along with set theory and logic.
Language Definition
A language is
- a collection of expression,
- some operations, and
- shared properties (static and dynamic guarantees).
Syntax
Language syntax is defined using a BNF grammar. This is an inductive construction.
b ::=
true
| false
n ::=
0
| 1
| 2
| … | 42
| … | 1000
| …
e ::=
n | b | e +
e | e -
e | if
e then
e else
e
An alternative is using set notation, which is avoided due to its verbosity.
b = {true
, false
}
n = ℕ
e = b ∪ n ∪ { e1 +
e2 : e1 ∈ e, e2 ∈ e } ∪ …
Examples expressions in this languages are
true
1 + 2
- e1
+
e2 (where e1 and e2 are expressions)
Semantics
Semantics answer the question how to define eval
.
eval(e) = v
Denotational Semantics
Denotational semantics interpret expressions as math, e.g.
For example, consider the semantic function 〚〛 : Expression -> SetExpression
.
〚n
〛 = ℕ
〚e1 +
e2〛 = 〚e1〛+〚e2〛
〚e1 -
e2〛 = 〚e1〛-〚e2〛
The evaluation function eval
is then defined using the semantic function 〚〛.
eval(
e)
= 〚e〛
Operational Semantics
Operational semantics are defined using reduction relation (→) and conversion relation (→*), assuming some existing language.
Notation for reduction relations: expression → simpler expression, i.e. e → e’.
In the following example, n ::=
0
| add1
e.
if true then
e1 else
e2 → e1
if false then
e1 else
e2 → e2
0 +
e2 → e2
add1 0 +
e2 → e2
For example, 1 + 2
, which is short for add1 0 + add1 (add1 0)
, → 0 + add1 (add1 (add1 0))
→ add1 (add1 (add1 0))
, i.e. 3
.
Inference rule notation:
P₁ P₂ ... Pₙ
———————————————
C
where Pᵢ are premises and C is the conclusion.
Notation for reduction relations: expression →* expression, i.e. e →* e’.
For example,
————————— Reflexive
e →* e'
e₁ → e₂ e₂ →* e₃
————————————————————— Transitive
e₁ →* e₃
The rest of the rules are congruence rules.
e₁ →* e₁' e₂ →* e₂'
———————————————————————— Congruence-Addition
e₁ + e₂ →* e₁' + e₂'
e →* e'
——————————————————————————————————————————————— Congruence-If
if e then e₁ else e₂ →* if e' then e₁ else e₂
etc.
Equivalence relation: e ≡ e’
e →* e'
—————————
e ≡ e'
e ≡ e'
————————
e' ≡ e
For example, consider 1 + 2
.
0 + 3 → 3 3 →* 3
————————————————————
1 + 2 → 0 + 3 0 + 3 →* 3
———————————————————————————————————————
1 + 2 → 3
Valid output/value (can put output in registers, heaps, etc. here): v ::=
n | b.
The evaluation function eval
is then defined as follows.
eval(
e)
= v | e → v
which is read as the result of evaluating e is v given that e converts to v.
Extending the language
Let’s extend the language to support pairs.
Syntax extension
e ::=
… | (
e,
e) | fst
e | snd
e
Semantic extensions
Reduction rules
These should be the most primitive rules/operations.
fst (e₁, e₂) → e₁
snd (e₁, e₂) → e₂
The rules for fst
and snd
are called elimination rules. The introduction rule for a pair is its constructor.
Conversion rules
e₁ →* e₁' e₂ →* e₂'
——————————————————————— Congruence-Pair
(e₁, e₂) →* (e₁', e₂')
e →* e'
————————————————— Congruence-fst
fst e →* fst e'
e →* e'
————————————————— Congruence-snd
snd e →* snd e'
Conversion rules let us work on sub-expressions.