Final Reflection

Allie O’Connell

May 1, 2016

A Final Reflection

Throughout the semester, I have struggled, grown, and most importantly- learned in this class. While learning about what a corpus was, and actually creating one myself, the first and foremost important task was creating one that had actual meaning. With the upcoming presidential election, I thought it would be interesting to create a corpus distinguishing the differences and similarities amongst the political speeches of females and males. I chose to use speeches from Hillary Clinton and current president, Barrack Obama. I led my exploration by always reverting back to my essential question- “What differences and similarities do political speeches have amongst men and women?” As simple as a question this may be, it allowed me to stay on task and truly unveil some interesting information.

In the beginning of the course I focused heavily on the construction and cleaning of my corpus. I used five speeches from Obama and five from Hillary. With the use of Voyant and Antconc, I began my endeavor in starting to identify the commonalities and distinctions between their speeches. I was first drawn in by noticing frequencies and the use of the word women and how it different amongst Obama and Hillary. Voyant was the gateway to my investigation leading me to further explore this notion in and Antonc. I was able to make some valuable distinctions between how Hillary addresses the people as whole, and Obama tends to refer to them as “men and woman,” subconsciously separating himself from what used to be a minority group. Additionally, the few times Hillary uses the phrase, she says “women and men,” subconsciously putting woman first to establish power amongst woman. Obama also doesn’t use the word woman at all unless he’s distinguishing the people, whereas woman is one of the primary subjects in Hillary’s speeches. This is a natural result being as though Hillary is a woman and these are issues not only personal to her, but also a significant part of her campaign as she Is trying to appeal to female voters.

Screen Shot 2016-05-06 at 4.45.21 PM

Obama:

Screen Shot 2016-05-06 at 4.46.13 PM

Hillary:

Screen Shot 2016-05-06 at 4.46.02 PM    Screen Shot 2016-05-06 at 4.46.54 PM Screen Shot 2016-05-06 at 4.46.43 PM

As we progressed throughout the semester I went onto work with jigsaw. I continued my research on the use of women focusing on Hillary and Obama’s lack of the use. By making a word tree I was able to identify the context in which Hillary was using the word women. She often discusses “women’s rights at home and around the world,” and “women who are raising children on minimum wage” for example. All very important issues which she emphasizes repeatedly. Again, an inherent result to gender differences and issues personal to each political figure.

Screen Shot 2016-05-06 at 4.47.06 PM

Working off of my new information that I gained from Jig Saw, I then started working with Mallet and topic modeling. Rather than giving me the context, this platform outlines specific topics and themes of discussion between Hillary and Obama. Going along with previous patterns, Hillary is again discussing things like woman’s rights, children, and family. On the other hand, (as seen below), Obama focuses more on future, the economy, and security. It is important to also acknowledge that my corpus only has five speeches for each Obama and Hillary, so this may not be an accurate representation of all their speeches. But as a whole, assuming it was accurate, these differences in speeches, could help make helpful assumptions to why Obama won the election and Hillary did not.

Obama:

Screen Shot 2016-05-06 at 4.47.37 PM

 

Hillary:

Screen Shot 2016-03-25 at 1.50.08 PM

The next platform I used was Alchemy. With Alchemy I was able to perform sentiment analysis on my corpus, detecting general moods and tones for each political figure. In one of Ramsay’s readings, he asks, “Who decides what sentimentality is (Ramsay)?” Moreover, even though Alchemy can help me decide and uncover certain sentimentalities, it is also an arbitrary process. Because it is a computer doing this and words and emotions are easily misinterpreted and can’t be measured statistically like words numbers for example, so there is a lot of room for a miscalculation by the computer. This was really emphasized to me in class when we did the Gettysburg address exercise. In this exercise, we had to manually decide which words would go in which category. With that being said, the entire class had different results, and this just goes to show how truly arbitrary some of these processes are and we can’t always count on the computer to give us a one hundred percent accurate result.

As you can see below, Hillary has a fairly even distribution amongst her tones between mixed, neutral, positive, and negative. On the other hand, we can see that Obama’s is overwhelmingly unevenly distributed. He has a small amount of subjects he talks about neutrally and mixed, and even smaller amount positive, and a very large amount of subjects that he speaks about negatively. His acknowledgement of all the negative problems in our country, could have been one of the reasons that he won the election. By identifying all the problems, the population could assume that he is striving to make a difference. I can’t really assume that all woman political figures tend to speak in a more positive tone than male political figures, but this would be a really interesting idea to look into.

Hillary:

Screen Shot 2016-05-06 at 4.47.58 PM

Obama:

Screen Shot 2016-05-06 at 4.48.04 PM

Evidently, through out my exploration of my corpus, I have been able to gather some really useful information that helped me come to very important conclusions as to how woman and men political figures differ in addressing the country and gender roles. Additionally, I was able to identify some key topics that males and females tend discuss and just as importantly, the tone and mood in which they discuss it. Pennebaker says, “The precise words you used to communicate your message revealed more than you can imagine (Pennebaker).” And this conclusion couldn’t be more true as seen in my results. For examples when Obama says, “Men and Woman,” he isn’t just addressing our country, but instead, subconsciously, saying so much more. He is splitting the population in to two groups and putting one gender before the other. He is acknowledging the difference and almost inequality amongst men and woman. Furthermore, I have learned so much through out the semester, and could truly spend a life time exploring the differences and similarities in my corpus as well as my research question. This semester triggered my interest in further exploring my research question and continuing to notice the importance in diction and moods not just in texts, but in everyday life.

 

Bibliography

Pennebaker, James W. The Secret Life of Pronouns: What Our Words Say about Us. New York: Bloomsbury, 2011.

Ramsay, Stephen. Reading Machines: Toward an Algorithmic Criticism. Urbana: U of Illinois, 2011.

Categories
"machine reading"

Machine Reading

Throughout the past two weeks our class has focused heavily on exploring texts on a macro and micro level. With the use of different digital platforms, we have been able to explore our corpus’s in a more complex and sophisticated manner than ever before. Part of the reason why this exploration was so difficult and hard to interpret is because all of these platforms are very statistical and numerical rather than user friendly like Voyant for example. So, the first step in really understanding what it was we were looking at, was adapting ourselves to the complexity of what we were looking at visually. Throughout the entire process though it was important to remember that our research was not computer driven, but instead the computer was the tool used to drive us to answer or answers per say. Pierazzo says, “The challenge is therefore to select those limits that allow a model which is adequate to the scholarly purpose for which it has been created (Pierazzo).” And the scholarly purpose is for us to be able to create a bigger picture of the text and inferences, but it is our job to decide how we use these digital platforms to do so. By using these platforms, we are able to look at our texts either zoomed in or zoomed out, in order to ultimately support our interpretations.

When first looking at the dendrograms in order to focus on the similarities and differences between the political speeches I was using, I was frazzled to say the least. So I started off by just inputting Obama’s speeches and seeing what I could come up with. A score of 1 indicates tight, distinct clusters, where numbers closer to 0 represent overlapping clusters. This is used to show how similar each speech is to all of the speeches as a whole. As we can see below most values are much closer to 0 indicating that the speeches might have some overlapping and similar word uses.

Screen Shot 2016-04-18 at 5.39.26 PM

I then went on to look at Hillary’s speeches in the dendrogram. The values on her numbers tended to be around the .05 mark, but over all higher than Obama. This might suggest to me that Hillary’s speeches are more different in her word use and styleometry.

Screen Shot 2016-04-18 at 5.39.15 PM

When putting in all of Hillary and Obama’s speeches, I got even more complex results. The use of different colors and high numbers suggests to me that essentially their speeches are pretty different and don’t have much in common. My results were not really that interesting because they are pretty predictable. It is only natural that Hillary’s speeches would differ from Obama’s because they are different people who speak differently. Although, the fact that Obama’s speeches were seemingly more similar, could influence the way that he imposes and reiterates his ideas to the American people as oppose to Hillary and the more severe difference in her speeches.

Screen Shot 2016-04-18 at 5.38.59 PM

In the continuance of our exploration in text analysis we moved on to the use of Oxygen, which was only more complicated of a notion to me than using dendograms. With oxygen we are looking at each word closely and actually telling the computer how to interpret these texts. This is important because it further shows how the computer programs are only a tool to further our understanding rather than the entire purpose of our corpus. For example, I labeled below each time “Ophelia” was referred to as her name or as “her, she, etc.” to understand how people view her.

Screen Shot 2016-04-15 at 1.51.20 PM

Pierazzo says “It is the argument of this article that editions as we know them from print culture are substantially different from the ones we find in a digital medium (Pierazzo).” And therefore, it was essential that we used these prints to extract a greater meaning and form a deeper understanding of this text as well as our own corpus. Pierazzo also states that it is difficulty choose “which features of the primary source are we to reproduce in order to be sure that we are following ‘best practice’(Pierazzo).” This also goes to show how arbitrary of a practice this is because the computer is a tool that we are using rather than the computer just giving us all the answers. Moreover, in using all these digital platforms I was able to see all of their flaws as well as all that they have to offer.

Topic Modeling and Sentiment Analysis

In this day and age, technology is so far advanced that is expected to be capable of completing just about any task, regardless of its difficulty. Though reading emotions, a task that is not so numerical and statistical but rather analyzed through word structure, diction, done is definitely no walk in the park for any machine. But with, new programs and digital platforms, machines are coming closer and closer to being competent of reading sentiment with an increasingly accurate result. Topic modeling allows machines to mine words together to create a list of topics which are being described. Ramsey asks, “Who decides what sentimentality is?” So before using digital tools to help us, manually creating words that fall in a specific topic for the Gettysburg address was a useful exercise. As we went around the class and realize how much our lists differed, if gave me a deeper understanding of how arbitrary topic modeling and sentiment analysis truly is because words don’t just have one meaning or one way in which they could be used. This also relates to the importance of one’s familiarity with their corpus and the texts that they are using. Often times machines will make mistakes, and it is the humans job to know their corpus well enough to pick up on these glitches and misinterpretations, in order to avoid false assumptions.

Personally, I was able to use Mallet to create a list of topics, one for Hillary Clinton’s speeches and one for Barrack Obama’s.

Hillary’s:

Screen Shot 2016-03-25 at 1.50.45 PM

Obama’s:

Screen Shot 2016-03-25 at 1.49.52 PM

Each of these topic modeling lists allowed me to see what each political figure chooses to talk about and discuss more. These topics are worldly issues that are important to them. Although they are specific to them personally, it is still possible to relate this to a broader scale of all women and men political figures. For example, one issue of topics that Hillary discusses is care for children with working single moms as you can see as the 10th topic which is not an issue that Obama seems to address. This is a pretty predictable result seeing as though, that would definitely be a more personal subject to a female rather than a male running for political office.

For Alchemy, I tested out one of Hillary Clintons speeches to see what kind of sentiments she is using which is shown below. Sentiment is a really important aspect of political speeches because analyzing this can show the magnitude in to which sentiment in their speeches, correlates with there success in winning a political position as well as making a worldly difference.

Screen Shot 2016-04-04 at 4.59.55 PM

In Ramsey’s book he says that Mueller’s lists “do not contain anything that one might call, at first glance an astonishing result.” As what some may see as mundane, and trivial words, are actually the most significant words in our corpuses allowing us to create greater inferences. For example, when I first started analyzing my corpus I was able to take words as simple as “men” and “women” to illuminate the difference in the way that men and women view their gender differences. In my further analysis, I could also use Alchemy to see the sentiment in which the words men and women are used to further my knowledge on the subject.

Ramsey also states that “As with Mueller’s lists, one is behooved to go further by examining the language from which these results are drawn.” Moreover, I have come to realize that it is not just the vocabulary in itself that is important in sentiment analysis as well as topic modeling, but the context in which they are placed. This is a really important to take into consideration because as I saw myself when I was doing the exercise on the Gettysburg Address, the words surrounding a topical word can be just as important as the word itself. Moreover, using topic modeling and sentiment analysis is only further my knowledge and inferences on my corpus, and I can’t wait to explore all that is left to discover with these digital platforms!