

Publisher: Cambridge University Press
E-ISSN: 1475-6064|44|2|408-428
ISSN: 0001-8678
Source: Advances in Applied Probability, Vol.44, Iss.2, 2012-06, pp. : 408-428
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation, such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulae for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four, respectively. It is demonstrated empirically that the formulae derived here are highly accurate when the per-base mutation rate is low, which holds for many biological organisms.
Related content


MODELS OF PT– WITH INTERNAL INDUCTION FOR TOTAL FORMULAE
Review of Symbolic Logic, Vol. 10, Iss. 1, 2016-12 ,pp. :


Generalized Hadamard's inequalities based on general Euler 4-point formulae
ANZIAM Journal, Vol. 48, Iss. 3, 2007-01 ,pp. :


Sharp integral inequalities based on general Euler two-point formulae
ANZIAM Journal, Vol. 46, Iss. 4, 2005-04 ,pp. :


Estimating the large mutation parameter of the Ewens sampling formula
Journal of Applied Probability, Vol. 54, Iss. 1, 2017-04 ,pp. :


Approximate mathematical models in high-speed hydrodynamics
Journal of Engineering Mathematics, Vol. 55, Iss. 1-4, 2006-08 ,pp. :