## 5 串匹配

2019-03-22 14:10 CST
2019-08-20 21:27 CST
CC BY-NC 4.0

We say that pattern $P$ occurs with shift $s$ in text $T$ (or, equivalently, that pattern $P$ occurs beginning at position $s+1$ in text $T$ ) if $0 \leqslant s \leqslant n - m$ and $T[s+1 \dots s+m] = P[1 \dots m]$ (that is, if $T[s+j] = P[j]$, for $1 \leqslant j \leqslant m$). If $P$ occurs with shift $s$ in $T$, then we call s a valid shift; otherwise, we call $s$ an invalid shift. The string-matching problem is the problem of finding all valid shifts with which a given pattern $P$ occurs in a given text $T$.

Algorithm Preprocessing Time Matching Time
Naive $0$ $O((n - m + 1)m) Rabin-Karp$\Theta(m)O((n - m + 1)m)
Finite automaton $O(m|\Sigma|)$ $\Theta(n)$
Knuth-Morris-Pratt $\Theta(m)$ $\Theta(n)$

## Notations

Notation Meaning
$\Sigma$ Alphabet
$\Sigma^*$ Set of all finite-length strings
$\varepsilon$ Zero-length empty string
$|x|$ Length of string
$xy$ Concatenation of strings
$w \sqsubset x$ $w$ is prefix of $x$
$w \sqsupset x$ $w$ is suffix of $x$

Lemma 32.1 (Overlapping-suffix lemma) Suppose that $x$, $y$ and $z$ are strings that $x \sqsupset z$ and $y \sqsubset z$. If $|x| < |y|$, then $x \sqsupset y$. If $|x| \geqslant |y|$, then $y \sqsupset x$. If $|x| = |y|$, then $x = y$.

## 32.1 The naive string-matching algorithm

• Time complexity: $O((n - m + 1)m)$,
• Worst-case running time: $\Theta((n - m + 1)m)$

## 32.2 The Rabin-Karp Algorithm

• Preprocessing: $\Theta(m)$,
• Worst-case: $\Theta((n - m + 1)m)$,
• Expected: $O(n) + O(m(v + n/q))$

## 32.3 String Matching with Finite Automata

### Finite Automata

A finite automaton $M$, is a 5-tuple $(Q, q_0, A, \Sigma, \delta)$ where

• $Q$ is a finite set of states.
• $q_0 \in Q$ is the start state,
• $A \subseteq Q$ is a distinguished set of accepting states,
• $\Sigma$ is a finite input alphabet,
• $\delta$ is a function from $Q \times \Sigma$ into $Q$, called the transition function of $M$.

Whenever the current state $q$ is a member of $A$, the machine $M$ has accepted the string read so fad. An input that is not accepted is rejected.

A finite automation $M$ induces a function $\phi$, called the final-state function, from $\Sigma^*$ to $Q$ such that $\phi(w)$ is the state $M$ ends up in after scanning the string $w$. Thus, $M$ accepts a string $w$ if and only if $\phi(w) \in A$. We define the function $\phi$ recursively, using the transition function:

$$\phi(\varepsilon) = q_0,$$ $$\phi(wa) = \delta(\phi(w), a) \text{ for } w \in \Sigma^*, a \in \Sigma$$

### String-matching Automata

In order to specify the string-matching automaton corresponding to a given pattern $P[1..m]$, we first define an auxiliary function $\sigma$, called the suffix function corresponding to $P$. The function $\sigma$ maps $\Sigma^*$ to ${0, 1, \dots, m}$ such that $\sigma(x)$ is the length of the longest prefix of $P$ that is also a suffix of $x$: $$\sigma(x) = \max {k: P_k \sqsupset x}$$

We define the string-matching automaton that corresponds to a given pattern $P[1..m]$ as follows:

• The state set $Q$ is ${0, 1, \dots, m}$. The start state $q_0$ is state $0$, and state $m$ is the only accepting state.
• The transition function $\delta$ is defined by the following equation, for any state $q$ and character $a$: $$\delta(q, a) = \sigma(P_qa)$$

### Finite Automoton Mater Algorithm

Lemma 32.2 (Suffix-function inequality) For any string $x$ and character $a$, we have $\sigma(xa) \leqslant \sigma(x) + 1$.

Lemma 32.3 (Suffix-function recursion lemma) For any string $x$ and character $a$, if $q = \sigma(x)$, then $\sigma(xa) = \sigma(P_qa)$.

Theorem 32.4 If $phi$ is the final-state function of a string-matching automaton for a given pattern $P$ and $T[1..n]$ is an input text for the automaton, then $$\phi(T_i) = \sigma(T_i)$$ for $i = 0, 1, \dots, n$.

### Computing the Transition Function

• Preprocessing: $O(m|\Sigma|)$,
• Matching: $\Theta(n)$.

Not required.