“Practicum 2 – Finding Digital Texts”

For Class

  1. Browse the online document/image collections listed in  DH Toychest > Data Collections and Datasets > Document/Image Collections section. in order to get a sense of what digital texts are available. Concentrate on texts that are no longer in copyright or texts that can be used under a Creative Commons license; and that are available in plain-text or HTML format).  Be sure to look especially at the larger, general purpose text collections that contain downloadable plaint-text, HTML, or XML files to see what is there–e.g.:
    1. EEBO-TCP Texts (see also catalog)
    2. Internet Archive (click on “Download” link on a book page for download format options)
    3. Open Library
    4. Oxford University Text Archive
    5. Project Gutenberg
———————-
  1. Examine the corpus Eighteenth Century Collections Online texts (2,198 plain-text English documents from Eighteenth Century Collections Online [TCP-ECCO]) (zip file)     (The zip file contains the spreadsheet metadata.xls and a folder containing the full text of all the novels in plain-text form.)
  2. Instructions on unzipping (decompressing) zip files: Mac, Windows.
  3. See spreadsheet (metadata.xls)  listing the authors and novels.
  4. Collect a list of 10 sample works (with links to them) that can be worked with in plain-text format and leave it as a souvenir of your experimentation on the course site (go to the page  “Practicum 2 – Finding Digital Texts” and create a post there called “Your Name – Finding Digital Texts”).
James Richardson Test Corpus 
Matt Fay’s Test Corpus
AC Li’s Test Corpus
Reed Widdoes Test Corpus
Taylor Yang’s Test Corpus
Allie O’Connell Test Corpus
Tyler Candelora Test Corpus