assignPOP: An r package for population assignment using genetic, non‐genetic, or integrated data in a machine‐learning framework

Publisher： John Wiley & Sons Inc

E-ISSN： 2041-210x|9|2|439-446

ISSN： 2041-210X

Source： METHODS IN ECOLOGY AND EVOLUTION, Vol.9, Iss.2, 2018-02, pp. : 439-446

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

AbstractThe use of biomarkers (e.g., genetic, microchemical and morphometric characteristics) to discriminate among and assign individuals to a population can benefit species conservation and management by facilitating our ability to understand population structure and demography.Tools that can evaluate the reliability of large genomic datasets for population discrimination and assignment, as well as allow their integration with non‐genetic markers for the same purpose, are lacking. Our r package, assignPOP, provides both functions in a supervised machine‐learning framework.assignPOP uses Monte‐Carlo and K‐fold cross‐validation procedures, as well as principal component analysis, to estimate assignment accuracy and membership probabilities, using training (i.e., baseline source population) and test (i.e., validation) datasets that are independent. A user then can build a specified predictive model based on the relative sizes of these datasets and classification functions, including linear discriminant analysis, support vector machine, naïve Bayes, decision tree and random forest.assignPOP can benefit any researcher who seeks to use genetic or non‐genetic data to infer population structure and membership of individuals. assignPOP is a freely available r package under the GPL license, and can be downloaded from CRAN or at https://github.com/alexkychen/assignPOP. A comprehensive tutorial can also be found at https://alexkychen.github.io/assignPOP/.