Comparing Normalization Methods for Differential Expression Analysis on RNA-Sequence Data from Autism Samples

Yu, Susanna [Browse]
Senior thesis


Fan, Jianqing [Browse]
Princeton University. Department of Operations Research and Financial Engineering [Browse]
Class year
Summary note
Identifying genetic variation and differential gene expression in Autism patients has been one of the main efforts to understand underlying causes of Autism spectrum disorder. Using brain tissue samples from Autism and Control patients, RNA-sequencing is used to record gene-level read counts that must be normalized across the Autism and Control samples in order to remove within-lane and between-lane biases in the data. Highly sensitive to normalization, the subsequent identification of differentially expressed genes will require an analysis of variance approach. This study compares three normalization methods: EDASeq, DESeq2 and TMM. These methods will be compared based on their reproducibility of differential gene expression analysis results derived from multiple bootstrap samples normalized by these methods. The ultimate goal is to identify a more reliable normalization technique for the RNA-sequence data and identify genes that are differentially expressed between the Autism and Control samples. By comparing gene frequencies and ranks, we found that DESeq2 normalization results in the highest level of reproducibility and EDASeq results in the lowest reproducibility, with TMM performing similarly to DESeq2. Biological interpretations of differentially expressed genes will be discussed.

Supplementary Information