Princeton University Library Catalog

Women’s Impact on GitHub: A Machine Learning Approach

Swenson, Hannah [Browse]
Senior thesis
LaPaugh, Andrea [Browse]
Princeton University. Department of Computer Science [Browse]
Class year:
46 pages
Summary note:
Not only are women underrepresented in computer science, but their percentage has been declining since the 1980’s. In order to analyze women’s potential impact on the programming world, I study female contribution and gender dynamics in Open Source Software (OSS) communities. GitHub, an OSS platform that allows for programmers world-wide to collaborate on software projects, is the largest publicly available source of OSS project data. I implement a supervised machine learning algorithm to predict, given certain factors, whether a programming team on GitHub has female members, in the hopes of showing positive correlations between having women on the team and project productivity or success. While my linear regression model predicts whether a team has women to a certain degree of accuracy, it is limited by the training data that is sparse of women. This sparsity in the dataset is consistent with and speaks to the broader issue of female underrepresentation in computer science.