Sentiment Analysis by Machines – Introduction to Text Analysis:

I think machines could read emotions but definitely not as well as human beings.

“Sentiment” is a very ambiguous concept. I took CSCI203 last year and our final project was to analyze tweets sentiments. What we did was to have a dictionary with words and a floating point number that represents their relative sentiments. Positive values mean happiness and negative values mean sadness. The program basically calculates the sentiments by adding the sentiment value of each word and plot the results across the nation based on their locations. The result of our design turned out to be roughly accurate and very interesting. We did discover unexpected and expected results. However, if we think twice, there were many problems with this method of calculating sentiments. As Ramsey asks in his book, “Who decides what sentimentality is?” I am sure there are many versions of sentiment dictionaries online but I don’t think there is an accurate dictionary that could serve as a reference for all the sentiments analysis. “Sentimentality” itself is an extremely subjective concept that depends on person to person and also requires a lot of data analysis and data mining. Even if we did manage to create a very accurate dictionary, the machines still could not analyze the sentiments of a whole sentence or paragraph as humans do. Sometimes, people use metaphors, sarcasm, comparison, etc. to convey their sentiments implicitly, so it will be really difficult for machines to analyze sentiments. Maybe machines could be trained through machine learning with enormous amount of data in order to make more accurate emotions reading someday.

Even people ourselves could have divergences on different concepts. From the topic modeling practice in class that we colored the words related to “war”, “government” or both, we could conclude that even we ourselves could not have a consensus of opinions on what we see or read. Ramsey states that “‘meaning’ is itself a shifting, culturally located concept incapable of precise definition or stable articulation.” Different people with different experience, cultural and academic background will have different understandings and interpretations of the texts that they see.

The sentiment analysis of machines means nothing if humans do not intervene. Ramsey concludes that “we will have understood computer-based criticism to be what it has always been: human-based criticism with computers” in the last chapter of his book. For a researcher who analyzes corpus sentiment, the results mean nothing if he or she does not really know the corpus, since first it is impossible for him or her to judge if the result is correct; second, he or she could not fully interpret the results or explain interesting discoveries or details in the results.

From the tools that I used such as top modeling tool Mallet and sentiment analysis tool Alchemy, I also discovered the downsides of machine reading. I first put all my scripts corpus into Mallet and I did not get any useful results. (See picture below) It displays a lot of names and stop words of the screenplays as a list of topics.

Screen Shot 2016-04-03 at 9.27.59 PM

I also put in my corpus of novels. Similar things happened.

Screen Shot 2016-04-03 at 10.26.51 PM

But when I put in individual file such as Alan Turing autobiography, it did show some useful results.

Screen Shot 2016-04-03 at 10.28.57 PM

It did display a lot of keywords of the novels. It provided some kinds of main ideas of the novel. From my experiment, I roughly conclude that Mallet is very useful for an individual text file but not for mixed and complicated corpus with a large size.

As for sentiment analysis function of Jigsaw, I tested it in the past and it corresponded my understanding of the files.

I did see the two movies (Silver Lining Playbook and Atonement) and I agree with the sentiments results I got since Atonement is indeed a more sad tragedy and on the contrary, Silver Linings Playbook is a happy ending movie. It is also interesting that there is no negative sentiment in the screenplays.

For Alchemy, I found that it is a very professional and useful tool for sentiment analysis. Its entity analysis contains more advanced and accurate subtypes and linked data than Jigsaw since Alchemy is a web-based tool such that it provides updated and advanced information.

A big advantage of Alchemy is that it does not have a low limit for the text file that could be uploaded. I could easily upload one novel in my novels corpus. For the test, I uploaded a Alan Turing: The Enigma to the API and to see what result it would give us. Before I uploaded it, I first had my own sentiment analysis for the novels. I think the main role Alan Turing has mixed sentiment in the novels since he successfully solved the Enigma but he could not let the message be heard by the German in order to achieve the ultimate victory. After I loaded the novel, the Alchemy did mark Alan Turing as a mixed sentiment which matches my understanding of the novel. However, it counted “ALAN TURING”, “Alan Mathison Turing” and “Alan Turing” as three separate entities, and I think it is a downside of the Alchemy API. As a web-based tool, it could use the resources online to provide better performance on identifying entities. Other than that I feel the analysis of sentiment is very useful because it provides “mixed” and “neutral” which is more useful for us than only “positive” and “negative” sentiments. It provides us with more insights about what is the sentiment of the loaded file for not only the entire file but also its different entities. Jigsaw only provides a sentiment analysis on the whole document and on the contrary, Alchemy provides sentiment analysis on entities identified. In this way, we could easily see how a particular person feels or sees a particular thing in the article.

Screen Shot 2016-04-04 at 9.40.19 AM

Screen Shot 2016-04-04 at 10.16.55 AM

Screen Shot 2016-04-24 at 2.12.21 AM

Screen Shot 2016-04-04 at 10.17.24 AM

It also provides useful taxonomy and emotion sessions which are very accurate and interesting to look at.

Another thing that I felt Alchemy could be better is that it should allow multiple files uploaded. Right now it only allows users to type in one text file.

After trying the tool, I felt curious and searched Alchemy online and found out that it is an API that uses machine learning (deep learning) to do natural language processing. So it is no doubt that it provides more accurate analysis than Jigsaw since machine learning means that the API is supervised and trained by real humans. It is supposed to be able to provide a sentiment analysis that is much adhered to real human beings.

In order to test if Alchemy is relatively accurate. I also load the text files of the screenplay of the Imitation Game (which is the adapted movies based on the novel Alan Turing: The Enigma) to check if there are some correlations between the two. Below is the result that I got:

Screen Shot 2016-04-24 at 1.46.33 AM

Screen Shot 2016-04-24 at 2.11.59 AM Screen Shot 2016-04-24 at 2.12.10 AM

It also identifies that Alan Turing has mixed sentiment. But I found really interesting results when I looked at Document Sentiment and Document Emotions. Alchemy gives a negative score of sentiment for the screenplay compared to the positive score of the original. It also gives the screenplay a 0.48445 anger score compared to the 0.091749 in the novel. I will look into this interesting result to investigate if the screenplays usually displays a relatively extreme or exaggerated sentiment compared to the original novels.

Text analysis is a roughly new field and I believe someday machines could eventually read emotions as well as humans.

Reference:

Ramsay, Stephen. Reading Machines: Toward an Algorithmic Criticism. Urbana: U of Illinois, 2011. Print.

Leave a Reply Cancel reply