Date of Award

6-2016

Document Type

Open Access

Degree Name

Bachelor of Science

Department

Computer Science

Second Department

History

First Advisor

Nick Webb

Second Advisor

Joyce Mandacy

Language

English

Keywords

Japan, visit, accessed, times, China, world

Abstract

Computational methods have been used with increasing frequency in the social sciences and humanities, due to the availability of digital sources and computing power to study everything from changes in the meanings of words in Latin texts to how knowledge was categorized in eighteen century encyclopedias. Recent trends in the fields of digital humanities and computational social science include statistical methods like machine learning, requiring large pre-tagged and annotated sets of documents which in turn necessitates a great deal of prior work to create data to use with such methods. This reliance on large corpora of annotated data limits the questions and topics one can investigate to those for which such resources already exist or where significant effort is available to make such annotations. With unannotated corpora, such as what one can gather from the internet automatically using web scraping, a significantly wider range of topics are able to be addressed with computational methods. Such data can be unstructured or semi-structured, like newspaper articles, movie reviews, or tweets. While the unannotated nature of the data does somewhat limit the methods of analyzing the data, a data augmented approach to history using unannotated corpora is still useful. In this thesis, I study the utility of term frequency analysis and sentiment analysis methods to determine how useful these methods are as an aid to historical analysis. In particular, I am using these methods to understand and analyze changes in discourse one particular historical issue over time.

Share

COinS