问题求解4-5 串匹配

分类:Problem Solving, 发布于:2019-03-22 14:10:00, 更新于:2019-04-19 00:09:40。 评论

We say that pattern $P$ occurs with shift $s$ in text $T$ (or, equivalently, that pattern $P$ occurs beginning at position $s+1$ in text $T$ ) if $0 \leqslant s \leqslant n - m$ and $T[s+1 \dots s+m] = P[1 \dots m]$ (that is, if $T[s+j] = P[j]$, for $1 \leqslant j \leqslant m$). If $P$ occurs with shift $s$ in $T$, then we call s a valid shift; otherwise, we call $s$ an invalid shift. The string-matching problem is the problem of finding all valid shifts with which a given pattern $P$ occurs in a given text $T$.

Algorithm Preprocessing Time Matching Time
Naive $0$ $O((n - m + 1)m)
Rabin-Karp $\Theta(m)$ $O((n - m + 1)m)
Finite automaton $O(m\|\Sigma\|)$ $\Theta(n)$
Knuth-Morris-Pratt $\Theta(m)$ $\Theta(n)$
Notation Meaning
$\Sigma$ Alphabet
$\Sigma^*$ Set of all finite-length strings
$\varepsilon$ Zero-length empty string
$\|x\|$ Length of string
$xy$ Concatenation of strings
$w \sqsubset x$ $w$ is prefix of $x$
$w \sqsupset x$ $w$ is suffix of $x$

Lemma 32.1 (Overlapping-suffix lemma) Suppose that $x$, $y$ and $z$ are strings that $x \sqsupset z$ and $y \sqsubset z$. If $|x| < |y|$, then $x \sqsupset y$. If $|x| \geqslant |y|$, then $y \sqsupset x$. If $|x| = |y|$, then $x = y$.

32.1 The naive string-matching algorithm

  • Time complexity: $O((n - m + 1)m)$,
  • Worst-case running time: $\Theta((n - m + 1)m)$

32.2 The Rabin-Karp Algorithm

  • Preprocessing: $\Theta(m)$,
  • Worst-case: $\Theta((n - m + 1)m)$,
  • Expected: $O(n) + O(m(v + n/q))$

32.3 String Matching with Finite Automata

Finite Automata

A finite automaton $M$, is a 5-tuple $(Q, q_0, A, \Sigma, \delta)$ where

  • $Q$ is a finite set of states.
  • $q_0 \in Q$ is the start state,
  • $A \subseteq Q$ is a distinguished set of accepting states,
  • $\Sigma$ is a finite input alphabet,
  • $\delta$ is a function from $Q \times \Sigma$ into $Q$, called the transition function of $M$.

Whenever the current state $q$ is a member of $A$, the machine $M$ has accepted the string read so far. An input that is not accepted is rejected.

A finite automation $M$ induces a function $\phi$, called the final-state function, from $\Sigma^*$ to $Q$ such that $\phi(w)$ is the state $M$ ends up in after scanning the string $w$. Thus, $M$ accepts a string $w$ if and only if $\phi(w) \in A$. We define the function $\phi$ recursively, using the transition function:

$$\phi(\varepsilon) = q_0,$$ $$\phi(wa) = \delta(\phi(w), a) \text{ for } w \in \Sigma^*, a \in \Sigma$$

String-matching Automata

In order to specify the string-matching automaton corresponding to a given pattern $P[1..m]$, we first define an auxiliary function $\sigma$, called the suffix function corresponding to $P$. The function $\sigma$ maps $\Sigma^*$ to $\{0, 1, \dots, m\}$ such that $\sigma(x)$ is the length of the longest prefix of $P$ that is also a suffix of $x$: $$\sigma(x) = \max \{k: P_k \sqsupset x\}$$

We define the string-matching automaton that corresponds to a given pattern $P[1..m]$ as follows:

  • The state set $Q$ is $\{0, 1, \dots, m\}$. The start state $q_0$ is state $0$, and state $m$ is the only accepting state.
  • The transition function $\delta$ is defined by the following equation, for any state $q$ and character $a$: $$\delta(q, a) = \sigma(P_qa)$$

Finite Automoton Mater Algorithm

Lemma 32.2 (Suffix-function inequality) For any string $x$ and character $a$, we have $\sigma(xa) \leqslant \sigma(x) + 1$.

Lemma 32.3 (Suffix-function recursion lemma) For any string $x$ and character $a$, if $q = \sigma(x)$, then $\sigma(xa) = \sigma(P_qa)$.

Theorem 32.4 If $phi$ is the final-state function of a string-matching automaton for a given pattern $P$ and $T[1..n]$ is an input text for the automaton, then $$\phi(T_i) = \sigma(T_i)$$ for $i = 0, 1, \dots, n$.

Computing the Transition Function

  • Preprocessing: $O(m|\Sigma|)$,
  • Matching: $\Theta(n)$.

32.4 The Knuth-Morris-Pratt Algorithm

Not required.

评论