Princeton University Library Catalog

Can Computers Learn from Emoticons? Sentiment Analysis Through Supervised Machine Learning Techniques

Morin, Valerie [Browse]
Senior thesis
Kpotufe, Samory K. [Browse]
Princeton University. Department of Operations Research and Financial Engineering [Browse]
Princeton University. Program in Applications of Computing [Browse]
Class year:
Summary note:
With the rise of social media have come innovative breakthroughs in sentiment analysis, owing to the wealth of textual data now readily available. However, senti- ment analysis has long suffered from the unfortunately limited amount of labeled data due to the costly manual labeling process. This has led a lot of research to focus on unsupervised techniques instead, and research that continues to focus on supervised techniques accepts the loss in potential accuracy due to the small fraction of data it can train on.There are, however, already emotional signals embedded in social media text that have been overlooked by most sentiment analysis methods. Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu set out to answer the question of whether these signals could be helpful in sentiment analysis, and they were able to statistically verify that emotional signals such as emoticons were indeed representative of manually-labeled sentiment [1].In this thesis, I utilized text containing emoticons as a novel labeled dataset on which to train a sentiment analyzer through supervised machine learning techniques. By treating emoticons as sentiment labels for the text, I greatly expanded the dataset, a process known to improve classification accuracy [2]. I employed different feature selection methods, including n-grams, term-frequency inverse-document frequency, and doc2vec, to vectorize the Stanford Twitter Sentiment140 dataset expanded by my additional emoticon-labeled data. Using these features, I trained an array of super- vised learning classifiers and a convolutional neural network to determine sentiment polarity, with the aim of enhancing accuracy of a widely applicable field.