Improved Tuned Iterative ReliefF: A Fast Filtering Method for Human Genetics

Author/​Artist
GRANIZO-MACKENZIE, DELANEY [Browse]
Format
Senior thesis
Language
English
Description
63 pages

Details

Advisor(s)
CHARIKAR, MOSES [Browse]
Department
Princeton University. Department of Computer Science [Browse]
Class year
2014
Summary note
Identifying genotypes that are associated with disease phenotypes is an crucial problem in modern human genetics. Genetic data sets contain hundreds of thousands of genes, but only a small number are statistically associated with disease phenotypes. In addition, the phenomenon of epistasis means that there are groups of genes which are not statistically associated on their own, but together are associated with a disease phenotype. Relief algorithms are effective heuristics for detecting these groups of epistatic genes. They return a score for each gene such that genes associated with a disease phenotype are likely to be scored higher than genes which are not associated. In this paper we present runtime reductions of 1.42x and 3.15x to the two most recent Relief algorithms: MultiSURF* and TURF. This runtime reduction translates to many hours of saved time per algorithm run. A memoization technique allows our new version of TURF to achieve the same success rate in less than a third of the time. We also analyze the parameters for MultiSURF*, as these are currently not formally established. We show that it is difficult to select a parameter that will outperform the current one.

Supplementary Information