

Author: Gupta Ashutosh Agarwal Suneeta
Publisher: Inderscience Publishers
ISSN: 1744-5485
Source: International Journal of Bioinformatics Research and Applications, Vol.7, Iss.2, 2011-05, pp. : 115-129
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
This paper introduces a novel algorithm for DNA sequence compression that makes use of a transformation and statistical properties within the transformed sequence. A word based tagged code is used for identification of end of code. The word based encoder uses frequency distribution for assigning the code of words. The designed compression algorithm is efficient and effective for DNA sequence compression. As a statistical compression method, it is able to search the pattern inside the compressed text which is useful in knowledge discovery. Experiments show that our algorithm is shown to outperform existing compressors on typical DNA sequence datasets.
Related content








Reagentless detection of DNA sequences on chemically modified electrodes
Trends in Biotechnology, Vol. 21, Iss. 12, 2003-12 ,pp. :