Categories
Uncategorized

Sentiment Analysis: Can Machines Read Emotions

“It is manifestly impossible to read everything, and it has always been so. The utility of the digital corpus— despite its vaunted claims of “increased access”— only serves to make the impossibility of comprehensive reading more apparent.”

 

Of all the articles we have read, I believe this is the more relevant, insightful, and true statement this semester, by any author. Ramsey is spot on in his defense of the digital humanities, because in one statement here, he has both admitted its flaws and exemplified its purpose. The goal of Digital Humanities is not to gain comprehensive readings on the texts being analyzed, but instead to provide data that humans could never do. As he puts it, “it is unlikely that a human being, even if asked to name only the top three words in each text, would produce these lists precisely as the machine gives them to us”. This is the true power of digital analysis, it provides information otherwise unobtainable.

Similarly, regarding comprehensive readings, humans possess the ability to evaluate semantic meanings of texts and understand context on a level a computer could never achieve. Ramsey, discussing students biases on books, says “Books come to them as high or low, deep or shallow, hard or easy, read “for pleasure” or read “for class;’ with dozens of gradations in between”

This kind of emotional or semantic analysis is deeply and inherently human, and something that a computer would not be able to comprehend in full. Yes a computer can understand that certain words are inherently shaded one way or another, but it is prone to mistakes. Awesome, for example, could be used to describe Hiroshima or your cousins wedding last weekend. Someone’s heart can skip a beat when they are about to crash a car, or when they are leaning in for a kiss. Words, phrases, context, language, are all inherently human and require complex thought in order to understand.

For these reasons, I just believe it is currently beyond the capacity of computers to analyze semantics and emotions, given the evidence presented by Topic Modeling.

 

Categories
Uncategorized

Creating a Corpus for Dummies

Upon commencing my journey to create a corpus serviceable for the purpose of literary, yet empirical, investigation, I decided to look into a topic that seemed to be just mysterious to the writers as it seemed to the readers. By doing this I hoped to be able to discover facts or ideas about the text that certain authors had not thought to even include, ones possibly mired in their subconscious. To this end I immediately thought of J.R.R Tolkien, who admitted to having complex mythology and history created, feasibly that is what one does in the trenches of France during WWI, before even pen to paper. He wrote about a fictional world with fictional characters, so rich and deep that those who find it’s pages, often find themselves engrossed in its totality. How then, a scholar might ask, could this world relate or be derived from his experiences? The metaphors are rampart throughout the novel, Tolkien even stating; ““My ‘Sam Gamgee’ is indeed a reflexion of the English soldier, of the privates and batmen I knew in the 1914 war”. What then is the relationship between the God of Middle Earth, Eru Illuvatar, and Tolkien’s Catholicism?

In order to offer a point of reference, I decided to look into the religious metaphors of a contemporary, compatriot, and friend to Tolkien, C.S. Lewis. Lewis was far more blunt regarding his faith, and as a generally known Apologist, the only people unable to understand the references to Christianity in his books The Chronicles of Narnia, were their intended readers, children. How though does Lewis display this character, Aslan, and how is his representation of a God different from Tolkien’s?

The next step was to find a set of texts I thought encompassed their respective writings on a whole. Upon online investigation I found some PDF’s of certain texts from both authors, being that they seemed to be out of copyright in Canada. I also received help for a large chunk of text from a former student of professor Faull’s. Upon completion and satisfaction with what I had gathered, I found myself with 5 bodies from each author. They are as follows:

Tolkien: The Lord of the Rings, The Silmarillion, The Hobbit, a series of letters to various recipients, and Unfinished Tales: The Lost Lore of Middle Earth.

Lewis: The Space Trilogy, The Chronicles of Narnia, The Screwtape Letters, Letters to Malcolm, and a series of unpublished letters.

In selecting these texts, I hoped to embody the works of each author not only in their famous fiction, but also in their less well known texts.

The difficult section of my task is the cleaning of corpus, as I am dealing with a huge amount of writing. Luckily enough for me, the copy of LOTR I received has already been cleaned and is ready to use, but that is merely the tip of the iceberg. The Chronicles of Narnia is of similar length and will need to be parsed in order to be readied for analysis. There is much work to be done in this department.

Finally, my I will discuss my search parameters for the actual analysis. Firstly, I will simply search for religious terms and adjectives used to describe said terms. Next I will repeat, but instead searching for the fictional characters that represent higher powers within the novels. I hope that this will provide some good insight into each authors representations of their beliefs.

Categories
Uncategorized

Distance Reeding (Talking Excessively from a Distance)

Argument

In a modern world where a countless number of new publications are being released everyday, and countless more lost from ages past, are yet to be discovered, it seems almost impossible to account for all the opinions, stories, cultures, histories, that these works carry within them. Even the most speedy of readers could not hope to conquer all the literature the world offers. This is where a term labeled distant reading can help the aspiring bookwork to grasp an otherwise unimaginable quantity of work.

 

The coin-er of the term distant reading, Franco Moretti quoted, in his duly titled book “Distant Reading”, Johann Wolfgang von Goethe’s statement “Literature is the fragment of fragments”. I believe that this really sums up in a way how distance reading attempts to tackle such large volumes of books. Distance reading attempts to capture the essence of these fragments, finding the underlying heart of the work, and exploiting it to show new empirical evidence regarding otherwise unseen themes. This is the first of the reasons that distance reading is so important for modern literary analysis. Not only does it decipher extreme quantity in a short period of time, but as Underwood puts it, ” we’re (distant readers) trying to reveal large-scale patterns that wouldn’t be evident in ordinary reading”. This deeper analysis of patterns unseen to the close readers eye, is what makes this practice so worthwhile. To continue on that point, the second reason for the importance of distance reading is similarly well summed up by Underwood, “this representation (digital representation) of text is radically different from readers”. By using digital representations of works rendered by distant reading techniques, the picture drawn to the observer will usually be starkly different than the picture drawn to the close reader. This brings us to the next step in the analysis, putting the two types of reading side by side in order to compare and learn from their difference. This process is called differential reading.

Clement writes that “differential reading … positions close and distant reading practices as both subjective and objective methodologies”, continuing later with, “Both close and distant reading practices can facilitate interpretation through subjective and objective means”. Differential reading is the contrastation of these two practices, offering the intimate, first hand, human biased knowledge discovered in close reading, and the empirical, underlying, objectively computed data discovered by distance reading. It offers a view into a work from different angles, providing the ability to back up claims made by data, and debunk the opinions presumed by the reader, as well as visa versa. Layering these two methods on top of each other, through differential reading, is best described by Clement’s simile; “This many-eyed perspective might be like ‘eye vision,’ which involves shooting a dynamic event, such as a soccer game, from multiple cameras placed at different angles”. This ability to offer an exclusive view into the text, is what makes differential reading an essential tool in the literary analyzers toolbox. Using this method unlocks a plethora of opportunities as Hoover writes in his article.To paraphrase a few: Testing hypotheses, supporting the claims of critical work, investigating how authors differentiate between the voices of characters, discovering radical shifts in style, etc. All of these seemingly unknown answers can be found, using differential reading.

And while this is a powerful tool there are certain drawbacks and difficulties. Distance Reading is not yet a perfect method.

  1. Uncertainty of the Role of Digital Analysis in the Humanities- There is an uncertainty to the validity of the empirical data provided by digital analysis. This though is a falsely held doubt, as Hoover writes; “Many kinds of evidence produced by statistical methods are simply not accessible without a computer”
  2. Dealing with Human Emotion in Texts- Computers are unable to interpret human emotion, and often analysis regarding mood, if not discoverable by words frequency or other parameters, must be realized through close reading
  3. Copyright Issues- Many texts are not out of copyright yet, such as most published after 1927 (in America). Furthermore, laws differentiate between countries, making things even more complicated.
  4. Many Texts are not even Digitized- Even if everything from above checks out, the work you are looking for might not even be available in digital form.

Experimentation 

For the purpose of showing the power of the tool, I decided to do an analysis of the reading we did by Clement. I began my simply cleaning her article and throwing it into a word cloud to count for words frequency.

 Screen Shot 2016-02-03 at 2.40.22 AM

There was no surprise, some of the terms that arose from the wordle. One thing the cloud shows which may not be so apparent at first, is how closely the digital part of distance reading is tied to the actual literary part. Reading and literary are almost as involved as digital and text, showing that these digital methods are more connected to the works than the idea of “empiricizing” a novel would suggest.

Next, I decided to throw the most common terms into Ngram, in order to see how their frequency in the article compares with their frequency in all digitized english texts.

Screen Shot 2016-02-03 at 2.45.27 AM

This shows shows some interesting trends. Most interesting, seems to be the rise in the word text, shown in blue. Text had a huge jump, but was sitting in an alright place to begin with, unlike data shown in orange. This is probably because of the shift in the meaning of the word. As the definition has changed from a body of writing, to an instant message, the words has become more popular.

Furthermore I decided to look into the New York Times records of articles, and see the trends using the same words.

 Screen Shot 2016-02-03 at 3.02.01 AM

Looking at this and comparing it to the Ngram is a great way to summarize the power of distance reading. By looking at these two graphs I now see that text finds itself at the bottom of the NYT, while in Ngram it is second from the top. This could suggest many things, for example, that writing about modern texting is not new worthy and is therefor not considered for articles. This would explain why it has grew the least, compared to the other four words.

Just from looking at a wordle of one of my assignments, I have now inference all of this data. I have analyzed and drew conclusion that I set out not knowing I could discover. This is why distance reading can be a powerful tool.