TOWARDS FAIR MACHINE LEARNING ALGORITHMS FOR MHC BINDING AND ANTIGEN IDENTIFICATION.

Author
Glynn, Eric [Browse]
Format
Book
Language
English
Published/​Created
[Princeton, NJ] : Princeton University, 2022.
Description
1 online resource (42 pages)

Details

Library of Congress genre(s)
Rare books genre
Restrictions note
This dissertation is under embargo until 04/20/2023. A digital copy is available for viewing in the Mudd Manuscript Library reading room during the embargo period. If you are interested in this service, please fill out the Ask Us form using the following link.
Summary note
  • The immune system protects the body from external threats, and through its adaptive arm detects and eliminates foreign pathogens and cancerous cells. Major histocompatibility complex (MHC) proteins play an essential role in adaptive immunity, as they allow cells to present a sampling of their proteome for immune surveillance by T cells, enabling detection and elimination of virally infected and cancerous cells. Identifying the specific peptides that can be bound by MHC proteins is thus critical for understanding the adaptive immune response to diverse threats. Many computational methods have been developed to predict MHC-peptide binding, and have enabled researchers to rapidly screen entire viral or cancer genomes in silico for putative T cell antigens. This technology is now used to design personalized immunotherapies targeting patient-specific neoantigens to fight cancers. Although MHC binding algorithms are widely used, the extreme polymorphism of the class-I MHC genes-with over 22,000 MHC alleles across human populations-poses significant obstacles for accurate antigen identification across diverse individuals and thus for all downstream research and therapies.
  • In this dissertation, we develop a state-of-the-art machine learning system to predict peptide binding for MHC alleles and introduce the first system to estimate model performance for the tens of thousands of MHC alleles. We perform analysis showing that significant differences in the amount of binding data associated with each MHC allele lead to data disparities across racial and ethnic groups. We show that machine learning is able to mitigate some of these disparities, and introduce an algorithm that prioritizes data collection to address remaining disparities. As part of this dissertation, we also include an introductory review, which is designed to provide computational biologists an accessible entry point into the biological systems associated with antigen recognition; in addition to a biological primer, we cover the many use cases of and algorithmic innovations underlying MHC binding models. Taken together, the components of this dissertation serve to advance the state of MHC binding algorithms and will greatly aid the broad spectrum of downstream research and therapeutic applications that utilize MHC binding algorithms to identify T cell antigens in a genetically diverse human population.
Notes
  • This dissertation is deposited in the ProQuest Dissertations and Theses Global database.
  • Advisor: SINGH, MONA.
  • Department: Molecular Biology.
  • Categories: Bioinformatics. Biology.
Dissertation note
Ph.D. Princeton University 2022
Other standard number
  • Glynn_princeton_0181D_14011.pdf
Statement on responsible collection description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...
Other views
Staff view

Supplementary Information