Predicting Gentrification in Washington, DC Using Housing Prices and Support Vector Machines

Hess, Salena [Browse]
Senior thesis
81 pages


Ahmadi, Amirali [Browse]
Princeton University. Department of Operations Research and Financial Engineering [Browse]
Class year
Summary note
Gentrification has dramatically affected Washington, DC in recent decades, and this thesis examines the causes and effects of gentrification at the census tract level. Though gentrification is typically approached from the humanities and social sciences, this work demonstrates that statistical analysis and machine learning techniques can be used to better understand the phenomenon. First, this thesis hypothesizes that the price of a house can be divided into two parts: the intrinsic value of the house and the aggregation of all external factors related to the location. Several location- sensitive models are explored in order to predict housing prices, and a linear model that treats census tracts as categorial variables is chosen as the best approach for this application. Second, this thesis presents an objective way of identifying gentrification using individual home sale data and the notion of a "desirability index," which is the premium or discount that homebuyers pay for the location of a house. Finally, support vector machines (SVMs) are used to predict future gentrification based on demographic information. This work is specifically concerned with predicting which census tracts will gentrify between 2015 and 2020. Ultimately, the "desirability in- dex" proves to be a useful way to predict housing prices and identify gentrification. Additionally, the SVM model built in this thesis proves to be relatively accurate in the case of Washington, DC, but it is unclear whether this model can be used generally to accurately predict gentrification in other cities.

Supplementary Information