GWAS1 : GWAA; Data Pre-processing« Back to Modules List
TitleGWAA; Data Pre-processing
AuthorSara Nunez, Tu Dao
File Size926.5 KB
DescriptionData pre-processing of genome-wide association study data is very involved. This module contains 4 labs that walk users through the many steps of filtering data to ensure a sound analysis. The first lab is titled "Understanding Genetic Data Structure" and can be used to teach an introduction to the common format of genetic data. It walks students through how to read in a small data set of SNPs from the web and explores the different components and how to subset them. The second lab is titled "SNP level filtering (part I)”. This assignment teaches students about the different reasons why we need to filter SNPs, and provides code to do so. Topics covered in this lab are call rate and minor allele frequency filtering. The third lab is titled "Sample level filtering" and can be used to teach students how to filter samples in the data to ensure homozygosity and low missingness. We do this by looking at sample call rate (similar to SNP level call rate in lab 2), heterozygosity, and population substructure/ancestry through principle component analysis. The final lab in this module comes back to SNP level filtering and teaches the user how to filter on the Hardy-Weinberg Equilibrium (HWE) criteria. HWE can is often used to further detect population substructure or genotyping errors. All labs contain established code for filtering processes as well as exercises. In most cases, students are asked to reproduce what they learned utilizing a separate data set at the end of the lab. Accompanying slides have also been provided as a basic backdrop for concepts involved in this module.