Digital Tool Activity for Week 5
This week, we’re trying out topic modeling in order to get a “sense” of the topics we can find in UNSONG from a computational perspective! Definitions of topic modeling are included in our lecture, in the notes above, and in Ted Underwood’s article (here).
Most importantly, like each of the tools we’ve used so far, the goal here is not to produce a “perfect” model (in fact, this may not even be possible: your results might end up looking VERY different from what you know of the novel, and that’s perfectly normal!)
Topic Modeling Tool Tutorial
STEP 1: Here in Canvas, go to Files from the left-hand toolbar, find the “UNSONG.txt” folder, and download it! I’d recommend saving it to your desktop or somewhere you can easily see/access it.
Step 1: download the file with every chapter of UNSONG
STEP 2: Download the topic modeling tool (here: https://senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html (Links to an external site.)) Then, from your Downloads folder, extract the files to a new folder! (will have default option, but you can choose another if you prefer)
Step 2: extract the folders of the tool once you download it
STEP 3: In the extracted folder, open the tool itself: it will be the one labeled an “Application”!
Step 3: Open the tool
STEP 4: Click “Input Directory” and set this as the UNSONG.text folder you downloaded in Step 1! Then click “Output Directory” and choose a folder where you want results to go (I’d recommend creating a folder just for this). Then click “Learn Topics,” sit back, and watch it happen!
Step 4: set folders for input and output, then run the tool!
STEP 5: Open the folder you selected for output in Step #4, where you’ll find these two types of output. CSV are like spreadsheets and will show you all the data that way; HTML offers you code documents you can open on the web.
Step 5.1: You should see CSV and HTML outputs
You can play around with either!
However, only the HTML one is *required* for Assignment #4 this week. When you open that folder (output_html), you should see something like below. Click that to open up your initial findings and see what “topics” the tool has identified!
Step 5.1: You can use either CSV or HTML, but you’ll need at least the HTML for Assignment 4
STEP 6: Our assignment asks you to run the tool at least 3 times: the first time on the default settings, and then the 2nd/3rd times with changes of your own choosing. So here’s how you can make those changes.
Click on “Optional Settings,” which will lead to the second screen pictured here:
Step 6.1: Click on “Optional Settings” which will lead to > Step 6.2: Make whatever changes you want to try out!
Some of your different options are underlined in blue in this second screenshot above. You can:
Add a spreadsheet with metadata: that is, specific words or phrases you want to the tool to look for. (More specific directions included on the page: https://senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html (Links to an external site.))
Add a spreadsheet with stopwords: that is, words you DON’T want the tool to consider
Change “number of iterations”: that is, change how many times each learning phase runs. Give it more or less time to learn than the default 400 iterations!
Change “number of topic words to print”: that is, decide how many words you want in a predicted topic cluster. Give it more or less than 20 words to describe what it thinks forms a topic!
Change “number of topics” (first screenshot): that is, change how many topics you want it to predict.
Pick whichever one (or ones!) of these options sound most interesting or promising to you. Then when you’ve made the changes you want to try, click “Ok” as circled and let the model run again! Like before, you can get the spreadsheets (CSV files) or open the HTML documents on the web to record your results for the 2nd time and 3rd times.
Make at least 1 change each time and run the model 2 more times (for a total of 3 times) to see what results for “topics” it gives you when you make various changes. Also, be sure to save/keep each new set of results (whether through copy-paste, screenshots, etc.) so that you can report on them in Assignment #4!