Initial Thoughts
Creating a corpus is not the easiest project; it requires time, patience, and a lot of effort. The process begins by asking which general question you would like to research. At first, it can be daunting to narrow your research to a specific category. I started by thinking about authors and genres that I enjoy reading. I wrote down all my ideas and considered the different approaches I could take. Eventually, I thought about researching certain literary writers – Emily Dickinson, Shakespeare, Thoreau, Emerson. Then, I picked out philosophical interests I have – Existentialism, Objectivism, Platonism. However, I wanted to save those ideas for a later project.
Instead, I decided to create a corpus of all the readings from my Comparative Humanities class, HUMN 150 Enlightenments. Also, I want to compare the new syllabus from 2015 to the first syllabus from 2000. My goal is to see the transition the course has made in genres, authors (gender), and number of texts. Also, I hope to do a sentiment analysis to see if most of our texts are positive, negative, or neutral.
HUMN 150 highlights some of the most important intellectual, political, and literary trends from the European Renaissance to the beginnings of “modernity” in the late 19th century. There are fourteen supplementary readings along with ten books from the 2015 syllabus.
The books on the 2015 syllabus:
- Oration on the Dignity of Man by Giovanni Pico della Mirandola
- The Prince by Niccolò Machiavelli
- The Essential Galileo by Galileo Galilei
- The Narrow Road to the Deep North by Basho
- Discourse on the Origin of Inequality by Jean-Jacques Rousseau
- Frankenstein by Mary Shelley
- A Narrative of the Life of Frederick Douglass by Frederick Douglass
- The Communist Manifesto by Karl Marx and Friedrich Engels
- The Origin of Species and The Descent of Man by Charles Darwin
- The Home and the World by Rabindranath Tagore
It was extremely difficult to find the supplementary readings online; so, I asked Professor Shields, my HUMN 150 instructor, for a PDF copy of each reading. Then, I was able to get Oration on the Dignity of Man from Professor Faull in a word document. For the other nine books I used Project Gutenberg, a digital library with free eBooks. However, there were issues with the books from Project Gutenberg. The books were not the exact translations and editions we are using for class. This could have a meaningful impact on my research.
Problems with Wrong Translations
- Word Cloud – The frequency of words could be different depending on the style of the translator’s lexicon.
- The word density will be different depending on the number of words in each text.
- There will be disparities in rhetoric. If I choose to look at literary devices like: alliteration, anaphora, and allusion.
Current Creation
Currently, I am cleaning and parsing my texts. I am almost finished removing filler words, page numbers, and extra spaces. Also, I need to scrape the PDF files of the supplementary readings. Then, I need to OCR the old syllabus’s texts. After I finish those three processes, I will have my corpus completed and prepared for textual analysis.
Finalizing My Corpus
I am keeping a metadata sheet of all the data I am using on Google Sheets. I know it is crucial to pay attention to detail and keep myself organized throughout this process. I will be thinking about the translation and edition problems along the way, and I will record my findings appropriately. I hope to discover gaps within the two syllabi like missing genres and gender preferences in authors. Also, I would like to concentrate on certain texts like Oration on the Dignity of Man and Leonardo Da Vinci’s Notebooks to illustrate similarities and differences between the thought processes of both writers. Even though creating a corpus is an arduous task, the new discoveries you can achieve are groundbreaking.