Description
This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. The book starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics.
The topics examined include standard material such as the Perron-Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum-Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. The book also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models ar
Chapter
1.3 Random Variables Assuming Infinitely Many Values
1.3.2 Markov and Chebycheff Inequalities
1.3.3 Hoeffding's Inequality
1.3.4 Monte Carlo Simulation
1.3.5 Introduction to Cramér's Theorem
Chapter 2. Introduction to Information Theory
2.1 Convex and Concave Functions
2.2.1 Definition of Entropy
2.2.2 Properties of the Entropy Function
2.2.3 Conditional Entropy
2.2.4 Uniqueness of the Entropy Function
2.3 Relative Entropy and the Kullback-Leibler Divergence
Chapter 3. Nonnegative Matrices
3.1 Canonical Form for Nonnegative Matrices
3.1.1 Basic Version of the Canonical Form
3.1.2 Irreducible Matrices
3.1.3 Final Version of Canonical Form
3.1.4 Irreducibility, Aperiodicity, and Primitivity
3.1.5 Canonical Form for Periodic Irreducible Matrices
3.2 Perron-Frobenius Theory
3.2.1 Perron-Frobenius Theorem for Primitive Matrices
3.2.2 Perron-Frobenius Theorem for Irreducible Matrices
PART 2. HIDDEN MARKOV PROCESSES
Chapter 4. Markov Processes
4.1.1 The Markov Property and the State Transition Matrix
4.1.2 Estimating the State Transition Matrix
4.2 Dynamics of Stationary Markov Chains
4.2.1 Recurrent and Transient States
4.2.2 Hitting Probabilities and Mean Hitting Times
4.3 Ergodicity of Markov Chains
Chapter 5. Introduction to Large Deviation Theory
5.2 Large Deviation Property for I.I.D. Samples: Sanov's Theorem
5.3 Large Deviation Property for Markov Chains
5.3.1 Stationary Distributions
5.3.2 Entropy and Relative Entropy Rates
5.3.3 The Rate Function for Doubleton Frequencies
5.3.4 The Rate Function for Singleton Frequencies
Chapter 6. Hidden Markov Processes: Basic Properties
6.1 Equivalence of Various Hidden Markov Models
6.1.1 Three Different-Looking Models
6.1.2 Equivalence between the Three Models
6.2 Computation of Likelihoods
6.2.1 Computation of Likelihoods of Output Sequences
6.2.2 The Viterbi Algorithm
6.2.3 The Baum-Welch Algorithm
Chapter 7. Hidden Markov Processes: The Complete Realization Problem
7.1 Finite Hankel Rank: A Universal Necessary Condition
7.2 Nonsufficiency of the Finite Hankel Rank Condition
7.3 An Abstract Necessary and Sufficient Condition
7.4 Existence of Regular Quasi-Realizations
7.5 Spectral Properties of Alpha-Mixing Processes
7.6 Ultra-Mixing Processes
7.7 A Sufficient Condition for the Existence of HMMs
PART 3. APPLICATIONS TO BIOLOGY
Chapter 8. Some Applications to Computational Biology
8.2 Optimal Gapped Sequence Alignment
8.2.1 Problem Formulation
8.2.2 Solution via Dynamic Programming
8.3.1 Genes and the Gene-Finding Problem
8.3.2 The GLIMMER Family of Algorithms
8.3.3 The GENSCAN Algorithm
8.4 Protein Classification
8.4.1 Proteins and the Protein Classification Problem
8.4.2 Protein Classification Using Profile Hidden Markov Models
9.1 BLAST Theory: Statements of Main Results
9.1.1 Problem Formulations
9.1.2 The Moment Generating Function
9.1.3 Statement of Main Results
9.1.4 Application of Main Results
9.2 BLAST Theory: Proofs of Main Results