machines reads poems..?

For my corpus I have chosen poems. I collected texts from four authors, I include Russian, English and Latin translation of the texts. In total I’m working with nine texts, it should give the representative sample. The range of my texts starts with Ancient Greece and Horace, then the Russian Empire in the XVIII-th century with Lomonosov, after the Russian Empire in the XIX-th century with Pushkin and it ends with the Soviet Union and Vladimir Vysotsky.

There a lot of poems with the idea when poet is building monument in his honor to make his work immortal. This tradition begun with Horace and I took Latin, Russian and English translation of his Monument.  Since I’m working with Russian literature I decided to start with Lomonosov who arise this topic in Russian poetry in the XVIII-th century. The next poem is by Pushkin and with Pushkin Russian language changes a lot, he is the father of the Russian poetry and he reformed the language, the style a lot. Even thought it’s one century difference the language and writing style is totally different and I’ wondering if the program can these differences and show it to us. The last but not the least is the song by Vysotsky where he’s building a monument for his work. Vysotsky represents another style, we call it bard song. Also he lived in the XX-th century during the Soviet Union time when the government changed the language to its modern state we speak now.

My final research question is based on the results of Lexos:

lexos

Here you can see Russian translation matching is logical: Lomonosov and Horace have close relation because Lomonosov was translating Horace. Vysotsky and Pushkin also match because they use modern Russian language.  Unexpected results were 1. style of Vysotsky (who lived in USSR at 70s-80s) was also close to Lomosov (who lived in Russian Empire at XVIII century), which I can hardly believe and which I’m going to double check. My guess for now is that probably the program caught the MFW in those two poems as “Muse” and other old Greek names and words; 2. Pushkin (who lived in XIX century) was also close to Vysotsky style and that’s at least understandable because both poems were about building a monument to themselves through their works and that both poets were writing against the present system in the country. for me, that’s amazing that machine could see these details that I didn’t pay attention before. This reminded me skype-talk with Dr. O’Sullivan when we were talking about certain authors styles and how we can compare them. Of course, the most part the comparative analysis still will do the researcher not machine, using the advantages of close reading. The final question  according to these results and my thoughts is why the machine shows the connection where it’s not supposed to be? Why the difference is only in English translation?

What to do to see the real differences? Need to combine close and distance reading methods, translation theory, metric system of both languages. My algorithm was this : Mark up texts in oXygen->Check Word collocation in Voyant->Use Poemage to see the rhymes.

oXygen.

You can get more sophisticated about how you want to look at your text in the browser – and why you want others to look at certain things, too. As Pierazzo mentions in her work that marking up the text is an “interactive act”, I would add to that it could be “a complete disaster” or “making love with your text”.

Using this tool we need to know which parts of text we want to focus on. Here we can come back to Pierazzo and what she thinks about that “Which features of the primary source are we to reproduce in order to be sure that we are following ‘best practice’?” Marking up the text is very convenient, you are deciding at how many topics you want o focus in your text, for instance, only names and places.Then you mark up all particular words that are related for names and places. But here is the trick: working with poems I can’t say that this is only “black or white”. Marking up the text is your own personal way of seeing it.

I used only several tags for my mark up (not the whole poem):

<placeName>

<persName>

<objectType>

<time>

<term type=”…”> (metaphor, synonym, allusion, etc.)

<note> (goes outside of poem structure but allows for editorial interpretation.)

oxehor                                       oxhor

(I’ll email you the rest of files).

The result in oXygen was unexpected: Russian translation of Horace has many parts of the speech that are different from the English translation that’s why here we have more names. Lomonosov  in English poem keeps the places and names and at the same time he is adding  new names (Aquilo,Rome, Muse) and phrase (My homeland will not keep silent), where in the Russian one is only  the 5th line is different from Horace: instead of “not wholly die”(ves’) is “won’t all die” (vovse). Pushkin in English poem starts a new tradition with adding a new Russian not Greek/Roman place (Alexander’s column) and he also is adding nations, soul and God. No mentions about Russia in this translation. Vysotsky in English poem totally changes the topic and idea of the monument itself. Here
monument is stone on his grave, that people built for his death. He adds sounds, body parts, a coffin maker and curse words also, and gets rid of ancient names and places. In Russian poem Vysotsky is going far away from the tradition, no ancient names or places are mentioned. He brings up sounds, physical description, but no curse words…He also says “monument” not “pamyatnik”
as a symbol of his grave and people(government) wanted to get rid of him. But at the end of the song
he is becoming alive and laughing at all these people who “put him into grave”…

But why everything would be so different? The problem is in translation and marking up the text helped me to see it more in details. Russian texts have more “slavyanisms” etymologically Russian words that can be used for stylization purposes in Russian but the English translation can’t show it because in English it simply doesn’t exist. For instance Russian words for “head” – “golova” and “glava”. Both mean the same – “head” – but in Russian text the author can use one or another for stylization or to keep the rhyme. Here we’re dealing with, as Antoine Berman would say, deforming tendencies of the translation which is called “negative analytic”. Berman says: “The negative analytic is primarily concerned with ethnocentric, annexationist translations and hyper­textual translations (pastiche, imitation, adaptation, free writing), where the play of deforming forces is freely exercised”. I think, in my corpora I have ennoblement type (the creation of elegant sentences), quantitative impoverishment (lexical loss, destruction of chains of signifiers). For example, Vysotsky’s poem, first two lines in Russian now mention of the “alive”, in English one it’s right in the second line.

Because of that the word collocation in Voyant is very different:

cirrus (I’ll die, head, part, was, death, lyre, monument, Muse).

cirrus2 Looking at two word clouds we can say that the Russian one is pessimistic and using more word “death”, when in English one we can see more words related to “alive” and it’s more positive.

Analyzing poems we need to remember about metric system. English has and accentual verse and therefore beats and offbeats (stressed and unstressed syllables) take the place of the long and short syllables of classical systems. In most English verse, the metre can be considered as a sort of back beat, against which natural speech rhythms vary expressively. The most common characteristic feet of English verse are the iamb in two syllables and the anapest in three. Russian has accentual-syllabic verse which is  an extension of accentual verse which fixes both the number of stresses and syllables within a line or stanza. Accentual-syllabic verse is highly regular and therefore easily scannable. Usually, either one metrical foot, or a specific pattern of metrical feet, is used throughout the entire poem; thus we can talk about a poem being in, for example,iambic pentameter… All these differences we can see in Poemage:

poemage1      poemage2

poemage3      poemage4

If we take a look on the schema of these four poems, we’ll see why Lexos “decided” to match in English translation Vysotsky and Lomonosov, Pushkin and Horace -> each group has almost the same assonance, alliteration, that’s why, I think, Lexos divided them in this two strange groups, because the words, used in each group were creating the same assonance, alliteration and etc. Basically, words in English translation doesn’t show all the richness of  Russian language. 

As Dr. Diane Jakacki was telling at class “Imagine you are an eagle, and from the height of your flight you see down in the prairie a little mouse” – this is a very good metaphor for the work that we’re doing at class this semester. I think, using digital humanities for the text analysis purposes is great and lets you to look at the text from the different point of view, “many eyes” technique. Just as Bathes “Death of the Author”  we are now experiencing “death of the reader”. Although, working closely with tools I realized that reader is not dead, he’s very active, he “interacts” with machine and produces a new text that gives us many answers or leaves us with many questions.

 

References:

Berman, A. (1985b/2000) ‘La traduction comme epreuve de l’etranger’, Texte 4(1985): 67-81, translated by L. Venuti as ‘Translation and the trials of the foreign’, in L. Venuti (ed.) (2000),pp. 284-97.

Clement, T. Text Analysis, Data Mining and Visualisations in Literary Scholarship. Electronic resource: https://dlsanthology.commons.mla.org/text-analysis-data-mining-and-visualizations-in-literary-scholarship/

Pierazzo, E. Diplomatic Reading. Literary and Linguistic Computing, Vol. 26, No. 4, 2011.

 

 

Categories
"machine reading"

Is the reader dead?!

During these past weeks we learned  a lot of new platforms that we can use for our own text analysis. Using different platforms you discover new ways of reading and analyzing your corpora. As Dr. Diane Jakacki was telling at class “Imagine you are an eagle, and from the height of your flight you see down in the prairie a little mouse” about distant and close reading.

Stylometry and Lexos

Stylometry helps you to explore patterns in texts, stylistic analysis. On macro-level (when we are eagle and looking down from the very high) Lexos and it’s dendrograms are very useful. First of all, Lexos helps you with cleaning your text and editing your stop-list words. For me it was a big surprise that multiword cloud can read Russian.

multicloud        multicloud2

As I have small texts, for me it wasn’t a problem with the stop-list words or cleaning the text, main words like “I”, “monument”, “alive”, “die” are still appearing so I din’t change there anything. But still the results of this word cloud are different from the one I got in Voyant, which means that different platforms also read “differently” the same texts( the only option that doesn’t ignore Russian).

Then I decided to see how does dendrograms work.

MFWpng

I got the results that I expected: 1. it ignored Russian; 2. Lomonosov’s and Horace’s works were similar, that’s how it should be, because Lomonosov was translating Horace to Russian, including the poem Monument. However, the unexpected results were: 1. style of Vysotsky (who lived in USSR at 70s-80s) was also close to Lomosov (who lived in Russian Empire at XVIII century), which I can hardly believe and which I’m going to double check. My guess for noe is that probably the program caught the MFW in those two poems as “Muse” and other old Greek names and words ; 2. Pushkin (who lived in XIX century) was also close to Vysotsky style and that’s at least understandable because both poems were about building a monument to themselves through their works and that both poets were writing against the present system in the country. for me, that’s amazing that machine could see these details that I didn’t pay attention before. This reminded me skype-talk with Dr. O’Sullivan when we were talking about certain authors styles and how we can compare them. Of course, the most part the comparative analysis still will do the researcher not machine, using the advantages of close reading (when you can see a little mouse in the prairie).

TEI – oXygen

You can get more sophisticated about how you want to look at your text in the browser – and why you want others to look at certain things, too. As Pierazzo mentions in her work that marking up the text is an “interactive act”, I would add to that it could be “a complete disaster” or “making love with your text”.

Using this tool we need to know which parts of text we want to focus on. Here we can come back to Pierazzo and what she thinks about that “Which features of the primary source are we to reproduce in order to be sure that we are following ‘best practice’?” Marking up the text is very convenient, you are deciding at how many topics you want o focus in your text, for instance, only names and places.Then you mark up all particular words that are related for names and places. But here is the trick: working with poems I can’t say that this is only “black or white”. Marking up the text is your own personal way of seeing it.  In my corpora I differentiate several topics: life, death, monument, glory, time, nation, sound, etc. Each author talks about it differently and brings up new metaphors for that and I need to rely only on my “feeling” of the text and my background knowledge. After we worked at class on Keat’s  poem I decided to mark up my texts, but I wasn’t very successful with that 🙁

My final thoughts about the work we’ve done these past two weeks were “Is the reader dead?” Just as Bathes “Death of the Author”  we are now experiencing “death of the reader”. Although, working closely with tools I realized that reader is not dead, he’s very active, he “interacts” with machine and produce a new text that gives us many answers or leaves us with many questions.

 

emotions: humans vs machines

Does a machine have emotions? Machine does not have emotions, however, it is able to read them. Topic Modeling is a search in the text clusters of  words that you can group in certain categories based on similarities, the computer compares the occurrence of topics within a document to how a word has been assigned in other documents to find the best match. Here is my topic modeling with Hemingway Old Man and the Sea:

tp

I believe, that the most important part in doing Topic Modeling (and Sentiment Analysis) is your familiarity with the text. If you do not know text well, you won’t be able to find a mistake in the results caused by wrong algorithm. My own Topic Modeling table:

topicmodeling

Sentiment Analysis algorithm is a search in the classification problem and using certain rules categories words by sentiment. For instance, Alchemy API gives us full sentiment analysis where we have key words with sentiment (negative, positive, neutral).  If you compare the results in my table and the results in Alchemy API you will find not a big difference:

senanvy

According to program the poem by Vysotskiy has more negative sentiments which means that it is a “sad” or even “angry” poem. However, it still does not show you all “shades” of the text. For us that means that we need to be careful using these programs and the results every time, not just blindly trust it.

The positive side of this method (as for machine programs) that it does all the matching and topic modeling for us, so if you have a huge text you do not need to spend time on doing everything by hand. Also, it gives us many eyes effect that helps us to find a new reading of the text.  Although, in my point of view we have more negative sides than positives. Negative sides: double work – once it done with matching you need to double check it after. You need to create a stop list for each text, because if you have word “not” on your stop list and at the same time “not” semantically placed with “not happy” you will loose the meaning of  “not happy” and would have left only “happy”. Also you need to check the dictionary, which words are related to angry or calm mood, may be for your specific text you would like to change words in the dictionary too.

I believe that machines can read emotions, if we “teach” them how to do so. The problem is that if with math we can rely on machine results because it is only numbers, we can not do the same with text. The answer is clear, we have precise algorithm where X+Y=1, when in literature we can not always say that, the meaning of the text can be changed even by one comma. For me doing sentiment analysis with program is the same as using Google translate, it’s good but better to check.