I Get by with a Little Help from Distant Reading

Distant Reading

Franco Moretti, an Italian literary scholar, pioneered a new research technique within the digital humanities known as “distant reading.” Distant reading, as Moretti told the New York Times, is ” understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data” (Schulz 3). It is also referred to as “textual/text analysis.”

Distant reading allows for an extraction of the text off the page and illustrated through digital tools. This provides a clear overall reading that is not always evident through close reading, the traditional process of reading materials.  Distant Reading is the ability to draw our attention from what traditional reading teaches and uncover the relation of patterns that emerge at a distance and close up. Tanya Clement refers to this process as if you are turning a magnifying glass upside down. As Clement states, it’s a method to “defamiliarize texts, making them unrecognizable in a way…that helps scholars identify features they might not otherwise have seen, make hypotheses, generate research questions, and figure out prevalent patterns and how to read them”(Clement 3). It is important to emphasis the new outlook text analysis can give us on an otherwise standard reading level. 

The practice of distant reading is becoming increasingly popular as new technologies emerge and questions are asked.

Distant Reading Changing Perspectives

Pretend you have just finished all of Shakespeare’s plays. You find each to be fascinating in its own right, but you forget the main concepts. Distant reading allows you to pull each play apart, extracting hidden information without having to read every play for the second time. Then, you can compare Hamlet, Macbeth, Othello, etc. together and make conclusions on each text or Shakespeare’s ideas.

A digital humanist, Ted Underwood, claims that you can “identify distinctive vocabulary” (Underwood 15). Therefore, this would give you an opportunity to pick out Shakespeare’s most used words and analyze his diction/lexicon for each play. Consequently, you can make inferences on why Shakespeare wrote the way he did or what his ultimate goal was in a piece of writing.

For example, this is my Wordle on Emily Dickinson’s poem entitled “I cannot live with You.”

wordle 3

This picture depicts all of the words in her poem without commonly used English words. The most dominant word in the Wordle is presumably “Life.” From this I can make some inferences and conclusions about this poem. For instance, the main idea surrounding the poem is “Life” according to the Wordle. However, just looking at the words without context makes it difficult to understand the tone of the text. That is when “differential reading” becomes important.

 

Differential Reading

Clement explains differential reading as “…close and distant reading practices as both subjective and objective methodologies” (Clement 2). Thus, close reading allows a critical analysis of a literary work, while distant reading acts as an “upside down magnifying glass,” illustrating hidden patterns to scholars. 

For instance with my Wordle of Emily Dickinson’s poem, it would help to read it myself and synthesis my own interpretation of the poem. Thereafter, I can put together my ideas with the digital techniques to make a more accurate hypothesis.

Another digital humanist, David Hoover, states that “Investigating how and the extent to which authors differentiate the voices of characters or narrators…” (Hoover 3) is possible. This could either be in a novel, play, poem, etc. Taking Hoover’s point into consideration, I could compare multiple Dickinson poems and compare the tones and rhetoric in each. Furthermore, I could even break a poem up into each stanza and look for different tones between the stanzas. The possibilities are endless with distant reading; there is always new information and approaches to discover.

Challenges to Distant Reading

There are some significant disadvantages to distant reading.

  1. Copyright Laws – As Hoover stated, “For texts not available in digital form, an electronic text can be created by scanning and OCR. Unfortunately, it is not entirely clear that this is legal for texts in copyright” (Hoover 13).
  2. Finding the Text – It is incredibly hard to find some texts online. Even if you are so fortunate to find your text, sometimes there are different editions, authors, and publishers. It can be extremely difficult to choose the text that best suits your research.
  3. Texts not in Digital Form – In this case, you can perform an OCR scan. Although, you must keep in mind copyright laws. Moreover, if there are any additional drawings or markings on the original, they might not copy.
  4. Sentiment – It is difficult for a computer to distinguish between emotions. As the reader, you have your own perspective and develop emotions from that.
  5. Expansiveness of Archives – The collections of certain digital archives may be too small for a complete analysis.

This is only a small list of disadvantages. Unfortunately, there are some more, but in most cases the pros outweigh the cons.

 

Example of a Challenge to Distant Reading

First, I did a Wordle of the Preamble of the United States Constitution.

wordle

Next, I used the three most dominant terms, “establish,” “United,” and “States.” However when I entered these three words into the N-gram, I put “United” and “States” together – advantage of close reading.

Screenshot (31)

I set the years between 1700 and 2008, to see the frequency of the terms used in literature.

Then, I used culturnomics or bookworm: ChronAm, to plot the same terms over time. However, Bookworm: ChronAm, was not reading “United States” even though it said you could enter a 2-gram (two word phrase). So I graphed “establish.”

Screenshot (32)Screenshot (33)Then I tried to graph “UnitedStates.” I did get a graph, but when I looked at the source texts, the only word highlighted was “United” in the articles. This shows that not all digital tools will work properly with what you want to do. Screenshot (35)

Screenshot (34)Finally, I tried “The United States” and received an oddly shaped graph. The articles highlighted words like “here” and “mistakes” which have nothing to do with “The United States.”

 

Screenshot (36)

Summary of Distant Reading

 Text analysis is based on the use of both subjective and objective practices. While, the objective practices require a mathematical output of word frequency, etc, there is a certain subjectivity in relation to interpreting the meanings from a graph, based on knowledge of history, philosophy, etc. – unquantifiable subjects. 

Underwood describes it as ” …an interdisciplinary conversation about methods…” (Underwood 5). He also states that you may get sucked in and come across new territory not yet discovered. Fortunately, that is where the fun lies, daring to climb to new heights and to make superior breakthroughs.

Works Cited

Schulz, Kathryn. “What Is Distant Reading?” The New York Times. The New York Times, 24 June 2011. Web. 31 Jan. 2016.

Clement, Tanya. “Literary Studies in the Digital Age.” Literary Studies in the Digital Age. 2013. Web. 31 Jan. 2016.

Hoover, David L. “Literary Studies in the Digital Age.” Literary Studies in the Digital Age. 2013. Web. 31 Jan. 2016.

Underwood, Ted. “Seven Ways Humanists Are Using Computers to Understand Text.” The Stone and the Shell. 04 June 2015. Web. 31 Jan. 2016.

 

“Bookworm.” Bookworm. Web. 31 Jan. 2016.

“Google Ngram Viewer.” Google Ngram Viewer. Web. 31 Jan. 2016.

“Wordle – Beautiful Word Clouds.” Wordle – Beautiful Word Clouds. Web. 31 Jan. 2016.

Discovering Digital Documents

BRAND_BIO_BIO_Martin-Luther-King-Jr-Mini-Biography_0_172243_SF_HD_768x432-16x9In my previous blog post, I utilized the ability of Google N-grams and Wordle to explore fascinating texts such as Dr. Martin Luther King Jr.’s “I have a Dream” speech and the first two paragraphs of the Declaration of Independence. While looking at the texts “off the page” and through digital tools, there were new assumptions and conclusions to be made.

Now I am digging deeper behind the most frequently used words,AP_Documents_DeclarationofIndependence such as “segregation,” and “government.” Additionally, I found specific newspapers that contain the frequently used words, which I emphasized with common historical knowledge in the last blog. However, now there is a valid connection to be made between the terms, history, and the context in which they were used using these research techniques.

 

First, I looked for old historical newspapers that contained the word “government.” There are a large amount of these, but I was interested in looking for some of the earliest articles that talk about the formation of our government. Unfortunately, bookworm: ChronAm can only search under historical documents dating between 1836 to 1922. However, I found a newspaper, The Illinois Free Trader, dated Ottawa, Illinois, Friday, July 3, 1840Screenshot (25).
Screenshot (24)Since this issue was printed a day before July 4th, the paper was predominantly about the government’s rights and liberties. There was one paragraph that was particularly interesting. It describes from a liberal perspective, America’s government in the hands of “pure republicanism.” From this we can gather the thoughts of many Americans during 1840. Keep in mind that this was only 64 years after the signing of the Declaration of Independence. Does this tell us that people had thought from early on the government had too much power over the people? It is hard to make a conclusion since this paper might be biased, and it is only one example. Although, the ability to look through newspapers, like this one, can give us a true insight on perspectives from different centuries.Screenshot (29)

(Bookworm:ChronAm of “government”)

Since I was unable to look for sources past 1922, I could not look for anything in particular with Dr. Martin Luther King Jr. Instead, I looked for segregation within bookworm:ChronAm. I found an article ,The broad ax. (Salt Lake City, Utah), 31 Jan. 1914Screenshot (27) The article discussed the prominence of segregation within education. It states that, “…Negro schools have been neglected, a large portion of our children are not in school…” It is incredible to see that segregation in school systems was discussed in 1914, but not officially deliberated until 1952 during Brown vs. Board of Education. This shows that problems overtime do not always go away quickly or effectively. Race is still an issue today, yet the perspective and ultimate goal of unity is shown through centuries.

Screenshot (30)

 

 

 

 

 

(Bookworm:ChronAm of “segregation”)

IMPORTANT: Take a look at the progression of “segregation” until 1922. The prevalence of the term becomes increasingly evident in a short amount of time.

Categories
ngrams

Digitalizing History

(Wordle of Martin Luther King’s “I have a Dream” speech)

There are many dominant words that significantly stand out in Dr. King’s speech. For example, there is freedom, Negro, dream, day, ring, nation, and every. However there are many subordinate words as well; there is racial, negro’s, Georgia, and colour.

Looking at Dr. King’s speech through textual analysis, in this case word frequency, allows for a better interpretation of his speech. Taking into consideration that the speech was written to motivate his followers to boycott, protest, and demonstrate until they were given equal rights and privileges like any other United States citizen, it would be evident that “freedom” is the most predominant word in his speech. For Dr. King, utilizing the word “freedom” in a copious manner allows for his listeners to understand his “call-to-action” and helps to re-emphasis his preeminent aspirations. Moreover, using words like “dream” and “nation” illustrate the idea of achieving his goals and the idea of becoming a whole country, where everyone has their rights.

On the other hand, the lack of words like “racial” and “negro’s” could be due to the fact that Dr. King did not want to point much attention on discrimination, but he wanted to look to the future and capture the “dream.” Additionally words such as “colour” and “Georgia” could have a negative connotation. Dr. King did not want to fill his followers with bad thoughts; however, he did want to motivate them towards creating racial equality.

Consequently, from this textual analysis we can see the psychological tactics and cleverness that Dr. King used throughout his speech.

(N-gram of MLKS’s most significant words in “I have a dream”)

This is one of the most impressive N-grams I have ever come across. It is clearly shown that between the years 1940 to about 1970, with the exception of one dip in the graph, the term “Negro” was used frequently. This would make complete sense because of the civil rights movement and events like “Brown vs The Board of Education.” However, in recent years the term “Negro” has dissipated throughout books. This is most likely because it is referred to as a “racial slur” and the negative connotation attached.

“nation” is more popular in texts during the early 19th century because the Constitution and Declaration of Independence had just been signed, and a new “nation” had just formed. Also, it is clear that “freedom” was not used much until about the 1940’s during World War 2.

The most interesting word for me is the progression of “dream”. In a world full of tragedy and despair, it seems that writers are becoming more optimistic. It’s like they are trying to tell the world that you should have hope and believe in tomorrow. In another sense, it could also be used more often because of technology and the material written uncovering new ground – “dreaming” what the power of technology could do.

Source Texts for MLK

During the upward trend and peak from about 1960 to 1970, there were many texts used with the word “Negro”. For example, Black World/Negro Digest, Funnyhouse of a Negro: A Play in One Act, and Black Metropolis: A Study of Negro Life in a Northern City.

For “nation” between 1880-1977, there are books like Redeemer Nation: The Idea of America’s Millennial Role and The Birth of the Nation: Jamestown, 1607.

Other Dominant Terms

Screenshot (17)

(N-gram of MLKS’s most significant words in “I have a dream”)

This N-gram represents a problematic issue with distant reading. While the word “one” has had high priority since 1800, it is not easy to distinguish in what context “one” is used. That is to say, “one” is a very common word, and many of the sources will have little to do with MLK’s speech. The same problem happens with “every” too.

Dec of Ind

(Wordle of The Declaration of Independence)

The most dominant words in The Declaration of Independence are by far Government, powers, among, and happiness. Some subordinate terms are abuses, form, Creator, and Safety.

I think distant reading in this case shows the faults that were made while making the declaration. For example, it seems that words that would be more close related with independence are subordinate, like “equal” and “liberty”. It’s odd that there is more emphasis on government, in general, than the people.

Screenshot (18)

(N-gram of the two most popular words in The Declaration of Independence)

“government” had a high amount of usage in literary text in the early 19th century, most likely because of the forming of the United States. However after lacking in importance, it became relevant again throughout the 1920’s to the 1970’s. I would assume this is because of the presence of the government in WWII, the Vietnam War, the civil rights movement and many more. There were a plethora of issues during this time that the government had taken action. Once again, after the 1970’s the usage of “government” declined. This could possibly be because there were no major issues that the government was involved in after 1970, except for in 2001 when the World Trade Center was under attack – the N-gram shows a small peak at 2000-2001.

“powers” is another problematic word because the literary texts could refer to “super natural powers,” etc. instead of government or the people’s powers. In fact, it is unclear whether “powers” refers to the government or people when looking at the wordle.

Extra Credit

The prosodic elements that the author shows on the site illustrate the emotional state of the MLK, but also shows the form of the utterances (question, commands, statements, etc.). It also exemplifies the way in which MLK discusses his points using beautiful metaphors. It is almost as if he sets up a metaphorical framework to make his points.

The use of his metaphors adds a lot of hidden emotion and thought-provoking ideas. It uncovers a great deal of Dr. King’s intellectual process during his writing/thinking process.

He is a mastermind at forming conceptual ideas beautifully and transmitting them to his audience with extraordinary emotion.