Theory of Computing

Finite Automata

Finite automata

Why?

To study the languages related to FAs

as a stepping stone to a richer computational model
useful background for NLP and compilers
understand regexp

Informal definition for computational machines

Computational machine (informal)

Fix an alphabet Σ; a computational machine M for a decision problem on any input string ω ∈ Σ*, either

outputs Accept and halt
outputs Reject and halt
run forever

Deterministic finite automata

Deterministic finite automaton

a deterministic finite automaton is a 5-tuple M = (Q, Σ, δ, q₀, F) where

Q is a finite set of states
Σ is a finite set (the alphabet)
δ : Q × Σ → Q (the transition function, which describes the transition and labels).
q₀ ∈ Q (starting state)
F ⊆ Q (accepting states)

Acceptance

A finite automaton M accepts intput string ω = ω₁ω₂…ωₙ ∈ Σ* if there exists a sequence of states r₀,r₁,…,rₙ ∈ Q such that

r₀ = q₀
rᵢ = δ(r_i-1, ωᵢ) ∀i = 1..n
rₙ ∈ F

r₀,r₁,…,rₙ are the sequence of states visited during the machine’s computation.

L(M) = { ω ∈ Σ* : M accepts ω }, the language accepted/decided/recognized by M.

Implicit error states

If δ is not fully specified, we assume an implicit transition to en error state.

Non-deterministic finite automata

Non-deterministic finite automaton

a non-deterministic finite automaton is a 5-tuple M = (Q, Σ, δ, q₀, F), where

Q, Σ, q₀, and F are the same as DFA’s
δ : q × (Σ ∪ {ε}) → 2^Q

Observation: ε is the empty string and cannot be in Σ.

Acceptance

An NFA accepts the string ω = ω₁ω₂…ωₙ if there exists a string y = y₁y₂…yₙ, where yᵢ ∈ Σ ∪ {ε} and a sequence r = r₀,r₁,..,rₘ, where rᵢ ∈ Q such that

ω = y₁ ∘ y₂ ∘ … ∘ yₘ, i.e. y is ω with some ε inserted
r₀ = q₀
rᵢ ∈ δ(r_i-1, qᵢ) ∀i = 1..m
rₘ ∈ F

Regular expressions

R is a regular expression if R is any of the following, where R₁ and R₂ are regular expressions.

a ∈ Σ
ε (the empty string)
∅ (the empty set)
R₁ ∪ R₂ or R₁ | R₂
R₁ ∘ R₂ or R₁R₂
R₁*

Short hand: Σ = (a₁ | a₂ | … | aₖ) where Σ = {a₁, a₂, …, aₖ}.

Notation: L(R) is the set of strings generated by the regular expression R.

R	L(R)
01	{ 0ⁱ1ʲ : i, j ≥ 0 }
Σ001Σ	{ all strings containing 001 as a substring }
(ΣΣ)*	{ all strings of even length }
Σ1Σ1Σ1Σ	{ ω : ω contains at least three 1’s} }

Equivalence of DFAs, NFAs, and regular expressions

Theorem: languages accepted by DFAs = languages accepted by NFAs = languages described by regular expressions = regular languages.

Regular languages

Regular language: a regular language is any language accepted by some FA. The set of all regular languages is called the class of regular languages (i.e. set of sets).

Theorem: Regular languages are closed under union, concatenation, Kleene star, and complement.

Operations

Let L₁ and L₂ be languages.

Concatenation: L₁ ∘ L₂ = {x ∘ y : x ∈ L₁ and y ∈ L₂} (note L₁ ⊄ L₁ ∘ L₂)
Union: L₁ ∪ L₂ = {x : x ∈ A or x ∈ B}.
Kleene star: L₁* = {x₁ ∘ x₂ ∘ … ∘ xₖ : xᵢ ∈ A and k ≥ 0}
Complement: Σ L₁ = {x : x ∉ A}
Intersection: L₁ ∩ L₂ = {x : x ∈ A and x ∈ B}

Theorem (closure properties): If A and B are regular languages, then so are their concatenation, union, Kleene star, complement, and intersection.

ε closure

ε closure: for any S ⊆ Q, let E(S) be the set of all states in Q that can be reached by following any number of ε-transitions.

Limits to the power of finite automata

Can any language be described by a finite automaton?

Example 1: Suppose L is a finite language. Is L regular?
- Yes. Suppose L = {x₁, x₂, …, xₙ}. Define Lᵢ = {xᵢ}, which is regular. We know Lᵢ ∪ Lⱼ is regular, so L₁ ∪ L₂ ∪ … ∪ Lₙ is regular.
- This construction fails if L is infinite (we’d get an “infinite” automaton).
Example 2: Suppose L is regular. Is L² = {xy : x ∈ L and y ∈ L} regular?
- Yes, it is L ∘ L.
Example 3: Suppose L is regular. Define L^dup = {xx : x ∈ L}.
- If L is finite, L^dup is also finite and hence is regular.
- If L = L(1*), then L^dup = L((11)*), which is regular.
- if L = Σ*, L^dup = {xx : x ∈ Σ*}.
  - Intuitively, {xx : x ∈ Σ*} is not regular.
  - In a finite automaton, its states are its memory.
  - Any FA accepting L^dup must remember the first half to compare it to the second half.
  - So FA must remember arbitrary large information but it can’t.

Finite automata can only read input once, from left to right.

They can only have finite memory.