Genome-wide association studies for autopolyploids
Genetic studies in autopolyploids and segmental allopolyploids are complicated due to higher allele dosage, outbreeding, polysomic inheritance, formation of multivalents, and double reduction. For these reasons many genetic studies are not directly conducted in these species but rather done in their diploid relatives or artificially created diploids. This study aims to fully expoit GWAS for autopolypoids and develop software package.
The Q+K linear mixed model for association mapping
y is an n x 1 vector of observed phenotypes, μ is grand mean (fixed term),
X is an n x p matrix of fixed effects (covariates other than population structure) where n = number of individuals and p is number of covariates, β is p x 1 vector of fixed effects,
S is n x d design matrix for SNP allele effects where d = number columns in the design matrix depending upon type of genetic model (discussed below), τ is SNP allele effects (fixed term),
Q is n x q design matrix population structure (1 or 0 if individual i is member of particular population or membership probability of being in particular population) where q is number of sub-populations, v is a q x 1 vector of population effects (fixed term)
Z is an n x n identity incidence matrix (random term). u is the random effect of mixed model with Var(u) = G = σ^2gK, where K is the n x n relatedness (kinship) matrix (see below for types used), σ^2g additive genetic variance.
e is n x n matrix of residual effects
The above model notations are in situation where single y observations are recorded for each individual. Kang et al. (2008) has described the model in a situation where one or more individuals have repeated measurements
In this case u and e are assumed to follow Gaussian distribution with null mean and variance of:
Overall variance :
The best linear unbiased estimates (BLUE) of τ, β, v and best linear unbiased predictors (BLUP) of u were obtained by solving the mixed-model equations. The F-test is performed after maximization of the following likelihood
To increase the speed of computation we can use the two-step population parameter previously determined (P3D) method was used that does not require iterations to estimate population parameters including genetic variance and residual variance for every marker.
F-test
where
Autpolyploid genetic models Different genetic models are possible in autopolyploids depending upon ploidy level.
Relationship matrix
(a) Identity by state (IBS) similarity method
For an organism with Ψ ploidy, and individuals i and j with x1, x2, x3, x4, …, xψ and y1, y2, y3, y4, ...., yψ alleles, the IBS similarity can be calculated as:
(b) Realized relationship matrix
K = MM' , where M {0,1,..., Ψ }, Ψ is ploidy level (4 for tetraploids)
(c) Gaussian Kernel
The method have been evaluated using both simulated and experimental data. R package GWASpoly has been developed.
Population structure
Let C(i,j) be dosage of reference allele (), rows indexed by individuals and column indexed by polymorphic markers. Let Φ(j) be column mean for j, p(j) be allele frequency (Φ(j)/Ψ) for j, then normalized matrix can be written as:
Then compute an eigenvector decomposition of Γ to extract principal component scores.
The another novel method used was Discriminant Analysis of Principal Components (DAPC) implemented in R package adgenet.
Posters and Presentations :
Rosyara U.R., and J.B. Endelman. 2015. Genome-Wide association studies for Autopolyploids, Plant and Animal Genomes XIII, Jan 10-14, San Diego, CA
Rosyara U.R., and J.B. Endelman. 2014. Development and application of genome-wide association studies for autotetraploid potato. Potato Association of America Annual Meeting, July 27–31, Spokane, WA