Flexible Pattern Matching in Strings :Practical On-Line Search Algorithms for Texts and Biological Sequences

Publication subTitle :Practical On-Line Search Algorithms for Texts and Biological Sequences

Author: Gonzalo Navarro; Mathieu Raffinot  

Publisher: Cambridge University Press‎

Publication year: 2007

E-ISBN: 9780511030130

P-ISBN(Paperback): 9780521039932

Subject: TP301.6 algorithm theory

Keyword: 计算机的应用

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Flexible Pattern Matching in Strings

Description

String matching problems range from the relatively simple task of searching a single text for a string of characters to searching a database for approximate occurrences of a complex pattern. Recent years have witnessed a dramatic increase of interest in sophisticated string matching problems, especially in information retrieval and computational biology. This book presents a practical approach to string matching problems, focusing on the algorithms and implementations that perform best in practice. It covers searching for simple, multiple and extended strings, as well as regular expressions, and exact and approximate searching. It includes all the most significant new developments in complex pattern searching. The clear explanations, step-by-step examples, algorithm pseudocode, and implementation efficiency maps will enable researchers, professionals and students in bioinformatics, computer science, and software engineering to choose the most appropriate algorithms for their applications.

Chapter

1.3.3 Automata

1.3.4 Complexity notations

2 String matching

2.1 Basic concepts

2.2 Prefix based approach

2.2.1 Knuth-Morris-Pratt idea

2.2.2 Shift-And/Shift-Or algorithm

2.3 Suffix based approach

2.3.1 Boyer-Moore idea

2.3.2 Horspool algorithm

2.4 Factor based approach

2.4.1 Backward Dawg Matching idea

2.4.2 Backward nondeterministic Dawg Matching algorithm

2.4.3 Backward Oracle Matching algorithm

2.4.3.1 Factor oracle

2.4.3.2 Search with the factor oracle

2.5 Experimental map

2.6 Other algorithms and references

3 Multiple string matching

3.1 Basic concepts

3.2 Prefix based approach

3.2.1 Multiple Shift-And algorithm

3.2.2 Basic Aho- Corasick algorithm

3.2.3 Advanced Aho- Corasiclc algorithm

3.3 Suffix based approach

3.3.1 Commentz- Walter idea

3.3.2 Set Horspool algorithm

3.3.3 Wu- Manber algorithm

3.4 Factor based approach

3.4.1 Multiple BNDM algorithm

3.4.2 Set Backward Dawg Matching idea

3.4.2.1 Suffix automaton for a set of strings

3.4.2.2 Search algorithm

3.4.3 Set Backward Oracle Matching algorithm

3.4.3.1 Factor oracle of a set of strings

3.4.3.2 Search with the factor oracle

3.5 Experimental maps

3.6 Other algorithms and references

4 Extended string matching

4.1 Basic concepts

4.2 Classes of characters

4.2.1 Classes in the pattern

4.2.2 Classes in the text

4.3 Bounded length gaps

4.3.1 Extending Shift-And

4.3.2 Extending BNDM

4.4 Optional characters

4.5 Wild cards and repeatable characters

4.5.1 Extended Shift-And

4.5.2 Extended BNDM

4.6 Multipattern searching

4.7 Other algorithms and references

5 Regular expression matching

5.1 Basic concepts

5.2 Building an NFA

5.2.1 Thompson automaton

5.2.2 Glushkov automaton

5.3 Classical approaches to regular expression searching

5.3.1 Thompson's NFA simulation

5.3.2 Using a deterministic automaton

5.3.3 A hybrid approach

5.4 Bit-parallel algorithms

5.4.1 Bit-parallel Thompson

5.4.2 Bit-parallel Glushkov

5.5 Filtration approaches

5.5.1 Multistring matching approach

5.5.2 Gnu's heuristic based on necessary factors

5.5.3 An approach based on BNDM

5.6 Experimental map

5.7 Other algorithms and references

5.8 Building a parse tree

6 Approximate matching

6.1 Basic concepts

6.2 Dynamic programming algorithms

6.2.1 Computing edit distance

6.2.2 Text searching

6.2.3 Improving the average case

6.2.4 Other algorithms based on dynamic programming

6.3 Algorithms based on automata

6.4 Bit-parallel algorithms

6.4.1 Parallelixing the NFA

6.4.1.1 Row-wise bit-parallelism

6.4.1.2 Diagonal-wise bit-parallelism

6.4.2 Parallelizing the DP matrix

6.5 Algorithms for fast filtering the text

6.5.1 Partitioning into k + 1 pieces

6.5.2 Approximate BNDM

6.5.3 Other filtration algorithms

6.6 Multipattern approximate searching

6.6.1 A hashing based algorithm for one error

6.6.2 Partitioning into k + 1 pieces

6.6.3 Superimposed automata

6.7 Searching for extended strings and regular expressions

6.7.1 A dynamic programming based approach

6.7.2 A Four-Russians approach

6.7.3 A bit-parallel approach

6.8 Experimental map

6.9 Other algorithms and references

7 Conclusion

7.1 Available software

7.1.1 Gnu Grep

7.1.2 Wu and Manber's Agrep

7.1.3 Navarro's Nrgrep

7.1.4 Mehldau and Myers' Anrep

7.1.5 Other resources for computational biology

7.2 Other books

7.2.1 Books on string matching

7.2.2 Books on computational biology

7.3 Other resources

7.3.1 Journals

7.3.2 Conferences

7.3.3 On-line resources

7.4 Related topics

7.4.1 Indexing

7.4.1.1 General indexes

7.4.1.2 Indexes for natural language

7.4.2 Searching compressed text

7.4.2.1 Compression algorithms

7.4.2.2 On-line pattern matching in compressed text

7.4.2.3 Indexed pattern matching in compressed text

7.4.3 Repeats and repetitions

7.4.3.1 Exact repetitions

7.4.3.2 Approximate repetitions

7.4.4 Pattern matching in two and more dimensions

7.4.4.1 Two-dimensional pattern matching

7.4.4.2 Other multidimensional matching problems

7.4.5 Tree pattern matching

7.4.6 Sequence comparison

7.4.7 Meaningful string occurrences

Bibliography

Index

Symbols

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

R

S

T

U

V

W

Y

The users who browse this book also browse


No browse record.