Can Computers Learn from Emoticons? Sentiment Analysis Through Supervised Machine Learning Techniques

Author/​Artist
Morin, Valerie [Browse]
Format
Senior thesis
Language
English

Availability

Available Online

Details

Advisor(s)
Kpotufe, Samory K. [Browse]
Department
Princeton University. Department of Operations Research and Financial Engineering [Browse]
Certificate
Princeton University. Program in Applications of Computing [Browse]
Class year
2017
Summary note
With the rise of social media have come innovative breakthroughs in sentiment analysis, owing to the wealth of textual data now readily available. However, senti- ment analysis has long suffered from the unfortunately limited amount of labeled data due to the costly manual labeling process. This has led a lot of research to focus on unsupervised techniques instead, and research that continues to focus on supervised techniques accepts the loss in potential accuracy due to the small fraction of data it can train on.There are, however, already emotional signals embedded in social media text that have been overlooked by most sentiment analysis methods. Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu set out to answer the question of whether these signals could be helpful in sentiment analysis, and they were able to statistically verify that emotional signals such as emoticons were indeed representative of manually-labeled sentiment [1].In this thesis, I utilized text containing emoticons as a novel labeled dataset on which to train a sentiment analyzer through supervised machine learning techniques. By treating emoticons as sentiment labels for the text, I greatly expanded the dataset, a process known to improve classification accuracy [2]. I employed different feature selection methods, including n-grams, term-frequency inverse-document frequency, and doc2vec, to vectorize the Stanford Twitter Sentiment140 dataset expanded by my additional emoticon-labeled data. Using these features, I trained an array of super- vised learning classifiers and a convolutional neural network to determine sentiment polarity, with the aim of enhancing accuracy of a widely applicable field.
Statement on language in description
Princeton University Library aims to describe library materials in a manner that is respectful to the individuals and communities who create, use, and are represented in the collections we manage. Read more...

Supplementary Information