Using Topic Modeling for historical research.
BookWorm
Using topic modeling is like using google as a search engine. I can save so much time and effort in pouring through corpus after corpus of data, literature, and text. By entering a few small key fazes you can identify trends, examine text and find resources useful to the parameters you set. Despite a few setbacks, such as computers not knowing what Polysemous words are our even possessive nouns utilizing tools such as this can be a major asset in your research. Below is an image of a search conducted on Bookworm. Bookworm allows you to plug in a few key words and identify the frequency in which those words you entered occur in literature, research, and articles within a corpus of text. You can adjust time period by utilizing the "Metric" feature and you can adjust case sensitivity by using the case button. Literature can be arranged by language, location, author, date any number of factors by clicking the drop down menu next to your key words in the search box.
Overall, I think that Book work is useful to determine how relevant your topic is within a corpus of text, and is extremely easy to use. To be honest the tutorial that comes with the HathiTrust corpus does help, but it is relatively straight forward and easy to use. This is helpful for historians to understand the scope of the research already done on their chosen topic within a corpus of text, as you can see there are relatively few works on Mussolini and Fascism, but a great deal on Architecture and Ancient Rome.
Overall, I think that Book work is useful to determine how relevant your topic is within a corpus of text, and is extremely easy to use. To be honest the tutorial that comes with the HathiTrust corpus does help, but it is relatively straight forward and easy to use. This is helpful for historians to understand the scope of the research already done on their chosen topic within a corpus of text, as you can see there are relatively few works on Mussolini and Fascism, but a great deal on Architecture and Ancient Rome.
MALLET
MALLET will require two tings from you time and patience. It requires quite a bit of set up before you can run the program, and once you actually do get it downloaded and turned on you will find out that you will also need to download Java DS setting a rather frustrating mood for MALLET, just when you think you have it all figured out, something else pops up. When you do get MALLET up and running you will be utilizing Command Prompt and you will feel like your working in a DOS application and that's because you are! MALLET utilizes DOS as its application software and that means hours and hours can be lost tediously searching through lines of commands all to come to find that you struck a random key or inserted a / instead of a \. As it was described to me "many a graduate student has been frustrated by this application" I should have known then I needed to block off quite a bit of time to wrestle with this program as I too have come to find that that statement is true. BUT! Mallet does offer a great repository for Topic modeling. Using MALLET to create lists of topics and key words that represent "a family of computer programs that extract topics from texts. A topic to the computer is a list of words that occur in statistically meaningful ways. A text can be an email, a blog post, a book chapter, a journal article, a diary entry – that is, any kind of unstructured text."(1) What MALLET does is compiles information into giant lists as seen in the pictures below. Text can be read by navigating your way to the directory and then pulling up the text file. MALLET also allows you to use 1,2,and 4G of memory for larger files, by allowing you to use Java for additional memory space and can be done in MALLET by the using the proper command prompt sequence. The great thing about MALLET is that you can import your own data and create directories, and compose your documents in one place by simply knowing the proper command codes; though be advised that you will have to create your text lists in Notepad or Word, and then save them as MS DOS text then through a series of issuing commands to MALLET compiling that information into a directory. You can export that directory into an Excel file and continue to use it in various applications, QGIS for example .
1 Shawn Graham, Scott Weingart, and Ian Milligan, "Getting Started with Topic Modeling and MALLET," The Programming Historian 1 (2012), https://programminghistorian.org/en/lessons/topic-modeling-and-mallet.
1 Shawn Graham, Scott Weingart, and Ian Milligan, "Getting Started with Topic Modeling and MALLET," The Programming Historian 1 (2012), https://programminghistorian.org/en/lessons/topic-modeling-and-mallet.