Theory of Computing

Finite Automata

Finite automata

Why?

To study the languages related to FAs

  1. as a stepping stone to a richer computational model
  2. useful background for NLP and compilers
  3. understand regexp

Informal definition for computational machines

Computational machine (informal)

Fix an alphabet Σ; a computational machine M for a decision problem on any input string ω ∈ Σ*, either

  1. outputs Accept and halt
  2. outputs Reject and halt
  3. run forever

Deterministic finite automata

Deterministic finite automaton

a deterministic finite automaton is a 5-tuple M = (Q, Σ, δ, q₀, F) where

  1. Q is a finite set of states
  2. Σ is a finite set (the alphabet)
  3. δ : Q × Σ → Q (the transition function, which describes the transition and labels).
  4. q₀ ∈ Q (starting state)
  5. F ⊆ Q (accepting states)

Acceptance

A finite automaton M accepts intput string ω = ω₁ω₂…ωₙ ∈ Σ* if there exists a sequence of states r₀,r₁,…,rₙ ∈ Q such that

r₀,r₁,…,rₙ are the sequence of states visited during the machine’s computation.

L(M) = { ω ∈ Σ* : M accepts ω }, the language accepted/decided/recognized by M.

Implicit error states

If δ is not fully specified, we assume an implicit transition to en error state.

Non-deterministic finite automata

Non-deterministic finite automaton

a non-deterministic finite automaton is a 5-tuple M = (Q, Σ, δ, q₀, F), where

  1. Q, Σ, q₀, and F are the same as DFA’s
  2. δ : q × (Σ ∪ {ε}) → 2Q

Observation: ε is the empty string and cannot be in Σ.

Acceptance

An NFA accepts the string ω = ω₁ω₂…ωₙ if there exists a string y = y₁y₂…yₙ, where yᵢ ∈ Σ ∪ {ε} and a sequence r = r₀,r₁,..,rₘ, where rᵢ ∈ Q such that

Regular expressions

R is a regular expression if R is any of the following, where R₁ and R₂ are regular expressions.

Short hand: Σ = (a₁ | a₂ | … | aₖ) where Σ = {a₁, a₂, …, aₖ}.

Notation: L(R) is the set of strings generated by the regular expression R.

R L(R)
01 { 0ⁱ1ʲ : i, j ≥ 0 }
Σ001Σ { all strings containing 001 as a substring }
(ΣΣ)* { all strings of even length }
Σ { ω : ω contains at least three 1’s} }

Equivalence of DFAs, NFAs, and regular expressions

Theorem: languages accepted by DFAs = languages accepted by NFAs = languages described by regular expressions = regular languages.

Regular languages

Regular language
a regular language is any language accepted by some FA. The set of all regular languages is called the class of regular languages (i.e. set of sets).

Theorem: Regular languages are closed under union, concatenation, Kleene star, and complement.

Operations

Let L₁ and L₂ be languages.

Concatenation
L₁ ∘ L₂ = {x ∘ y : x ∈ L₁ and y ∈ L₂} (note L₁ ⊄ L₁ ∘ L₂)
Union
L₁ ∪ L₂ = {x : x ∈ A or x ∈ B}.
Kleene star
L₁* = {x₁ ∘ x₂ ∘ … ∘ xₖ : xᵢ ∈ A and k ≥ 0}
Complement
Σ  L₁ = {x : x ∉ A}
Intersection
L₁ ∩ L₂ = {x : x ∈ A and x ∈ B}

Theorem (closure properties): If A and B are regular languages, then so are their concatenation, union, Kleene star, complement, and intersection.

ε closure

ε closure
for any S ⊆ Q, let E(S) be the set of all states in Q that can be reached by following any number of ε-transitions.

Limits to the power of finite automata

Can any language be described by a finite automaton?

Finite automata can only read input once, from left to right.

They can only have finite memory.