

Author: Haiminen Niina Mannila Heikki
Publisher: Inderscience Publishers
ISSN: 1748-5673
Source: International Journal of Data Mining and Bioinformatics, Vol.4, Iss.6, 2010-12, pp. : 675-700
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
Segmentation is a general data mining technique for summarising and analysing sequential data. Segmentation can be applied, e.g., when studying large-scale genomic structures such as isochores. Choosing the number of segments remains a challenging question. We present extensive experimental studies on model selection techniques, Bayesian Information Criterion (BIC) and Cross Validation (CV). We successfully identify segments with different means or variances, and demonstrate the effect of linear trends and outliers, frequently occurring in real data. Results are given for real DNA sequences with respect to changes in their codon, G + C, and bigram frequencies, and copy-number variation from CGH data.
Related content








A validation of the penalty model for collisions
By Deguet A. Joukhadar A. Laugier C.
Advanced Robotics, Vol. 13, Iss. 1-8, 1999-01 ,pp. :


A validation of the penalty model for collisions
By Deguet A. Joukhadar A. Laugier C.
Advanced Robotics, Vol. 13, Iss. 7, 1998-01 ,pp. :