Space-efficient multiple string matching automata

Author: Zhang Meng   Yang Tianyu   Wu Rui  

Publisher: Inderscience Publishers

ISSN: 1741-1084

Source: International Journal of Wireless and Mobile Computing, Vol.5, Iss.3, 2012-07, pp. : 308-313

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Aho-Corasick (AC) automaton is a data structure for multiple string matching. We present two compressing methods that enable the AC automaton to work on systems with limited resource such as mobile devices. By the first method, the AC automaton for a pattern set P over an alphabet of size σ needs (σ + 1)I + (1 + log|P| + logM)M + o(M) bits where M and I are the number of states and the number of non-leaf states of the AC automaton respectively, and a state transition takes O(1) time. By the second method, the space is I + (1 + log|P| + logM + log σ)M + o(M log σ) bits, and a state transition takes O(log log σ) time. We then combine the two methods together and archive trade-offs between the space and time complexity.