Genome-wide association scans for secondary traits using case-control samples.
Academic Article
Overview
abstract
Genome-wide association studies (GWAS) require considerable investment, so researchers often study multiple traits collected on the same set of subjects to maximize return. However, many GWAS have adopted a case-control design; improperly accounting for case-control ascertainment can lead to biased estimates of association between markers and secondary traits. We show that under the null hypothesis of no marker-secondary trait association, naïve analyses that ignore ascertainment or stratify on case-control status have proper Type I error rates except when both the marker and secondary trait are independently associated with disease risk. Under the alternative hypothesis, these methods are unbiased when the secondary trait is not associated with disease risk. We also show that inverse-probability-of-sampling-weighted (IPW) regression provides unbiased estimates of marker-secondary trait association. We use simulation to quantify the Type I error, power and bias of naïve and IPW methods. IPW regression has appropriate Type I error in all situations we consider, but has lower power than naïve analyses. The bias for naïve analyses is small provided the marker is independent of disease risk. Considering the majority of tested markers in a GWAS are not associated with disease risk, naïve analyses provide valid tests of and nearly unbiased estimates of marker-secondary trait association. Care must be taken when there is evidence that both the secondary trait and tested marker are associated with the primary disease, a situation we illustrate using an analysis of the relationship between a marker in FGFR2 and mammographic density in a breast cancer case-control sample.