Theory of Computing
Finite Automata
Finite automata
Why?
To study the languages related to FAs
- as a stepping stone to a richer computational model
- useful background for NLP and compilers
- understand regexp
Informal definition for computational machines
- Computational machine (informal)
Fix an alphabet Σ; a computational machine M for a decision problem on any input string ω ∈ Σ*, either
- outputs Accept and halt
- outputs Reject and halt
- run forever
Deterministic finite automata
- Deterministic finite automaton
a deterministic finite automaton is a 5-tuple M = (Q, Σ, δ, q₀, F) where
- Q is a finite set of states
- Σ is a finite set (the alphabet)
- δ : Q × Σ → Q (the transition function, which describes the transition and labels).
- q₀ ∈ Q (starting state)
- F ⊆ Q (accepting states)
Acceptance
A finite automaton M accepts intput string ω = ω₁ω₂…ωₙ ∈ Σ* if there exists a sequence of states r₀,r₁,…,rₙ ∈ Q such that
- r₀ = q₀
- rᵢ = δ(ri-1, ωᵢ) ∀i = 1..n
- rₙ ∈ F
r₀,r₁,…,rₙ are the sequence of states visited during the machine’s computation.
L(M) = { ω ∈ Σ* : M accepts ω }, the language accepted/decided/recognized by M.
Implicit error states
If δ is not fully specified, we assume an implicit transition to en error state.
Non-deterministic finite automata
- Non-deterministic finite automaton
a non-deterministic finite automaton is a 5-tuple M = (Q, Σ, δ, q₀, F), where
- Q, Σ, q₀, and F are the same as DFA’s
- δ : q × (Σ ∪ {ε}) → 2Q
Observation: ε is the empty string and cannot be in Σ.
Acceptance
An NFA accepts the string ω = ω₁ω₂…ωₙ if there exists a string y = y₁y₂…yₙ, where yᵢ ∈ Σ ∪ {ε} and a sequence r = r₀,r₁,..,rₘ, where rᵢ ∈ Q such that
- ω = y₁ ∘ y₂ ∘ … ∘ yₘ, i.e. y is ω with some ε inserted
- r₀ = q₀
- rᵢ ∈ δ(ri-1, qᵢ) ∀i = 1..m
- rₘ ∈ F
Regular expressions
R is a regular expression if R is any of the following, where R₁ and R₂ are regular expressions.
- a ∈ Σ
- ε (the empty string)
- ∅ (the empty set)
- R₁ ∪ R₂ or R₁ | R₂
- R₁ ∘ R₂ or R₁R₂
- R₁*
Short hand: Σ = (a₁ | a₂ | … | aₖ) where Σ = {a₁, a₂, …, aₖ}.
Notation: L(R) is the set of strings generated by the regular expression R.
R | L(R) |
---|---|
01 | { 0ⁱ1ʲ : i, j ≥ 0 } |
Σ001Σ | { all strings containing 001 as a substring } |
(ΣΣ)* | { all strings of even length } |
Σ1Σ1Σ1Σ | { ω : ω contains at least three 1’s} } |
Equivalence of DFAs, NFAs, and regular expressions
Theorem: languages accepted by DFAs = languages accepted by NFAs = languages described by regular expressions = regular languages.
Regular languages
- Regular language
- a regular language is any language accepted by some FA. The set of all regular languages is called the class of regular languages (i.e. set of sets).
Theorem: Regular languages are closed under union, concatenation, Kleene star, and complement.
Operations
Let L₁ and L₂ be languages.
- Concatenation
- L₁ ∘ L₂ = {x ∘ y : x ∈ L₁ and y ∈ L₂} (note L₁ ⊄ L₁ ∘ L₂)
- Union
- L₁ ∪ L₂ = {x : x ∈ A or x ∈ B}.
- Kleene star
- L₁* = {x₁ ∘ x₂ ∘ … ∘ xₖ : xᵢ ∈ A and k ≥ 0}
- Complement
- Σ L₁ = {x : x ∉ A}
- Intersection
- L₁ ∩ L₂ = {x : x ∈ A and x ∈ B}
Theorem (closure properties): If A and B are regular languages, then so are their concatenation, union, Kleene star, complement, and intersection.
ε closure
- ε closure
- for any S ⊆ Q, let E(S) be the set of all states in Q that can be reached by following any number of ε-transitions.
Limits to the power of finite automata
Can any language be described by a finite automaton?
- Example 1: Suppose L is a finite language. Is L regular?
- Yes. Suppose L = {x₁, x₂, …, xₙ}. Define Lᵢ = {xᵢ}, which is regular. We know Lᵢ ∪ Lⱼ is regular, so L₁ ∪ L₂ ∪ … ∪ Lₙ is regular.
- This construction fails if L is infinite (we’d get an “infinite” automaton).
- Example 2: Suppose L is regular. Is L² = {xy : x ∈ L and y ∈ L} regular?
- Yes, it is L ∘ L.
- Example 3: Suppose L is regular. Define Ldup = {xx : x ∈ L}.
- If L is finite, Ldup is also finite and hence is regular.
- If L = L(1*), then Ldup = L((11)*), which is regular.
- if L = Σ*, Ldup = {xx : x ∈ Σ*}.
- Intuitively, {xx : x ∈ Σ*} is not regular.
- In a finite automaton, its states are its memory.
- Any FA accepting Ldup must remember the first half to compare it to the second half.
- So FA must remember arbitrary large information but it can’t.
Finite automata can only read input once, from left to right.
They can only have finite memory.