Stanford Syllabus logo



Throughout this course we will investigate ways of reading, interpreting, and understanding literature at the macro or "corpus"-scale. Our "text" for the class will be a corpus of 3000 18th and 19th century British and American novels and our goal will be to produce a paper and submit our research for presentation at the 2011 meeting of the Alliance of Digital Humanities Organizations (ADHO) at Stanford.


Today's student of literature must be equally adept at gathering evidence from individual texts and accessing and mining digital text repositories. And "mining" here is the operative word. Literary scholars must learn to go beyond mere search. Today's students and scholars are adept at electronic search and comfortable searching digital collections for stray bits of evidence to fortify a hypothesis or support an argument. But the sheer amounts of literary "data" now available make search ineffectual as a means of thorough evidence gathering.

More interesting and exciting than the mere searching of digital archives is the ability to go beyond search and exploit computation to process and analyze the textual data these corpora contain. In practical terms, this means that the literary scholar must evolve, must embrace new approaches and new methodologies designed for accessing and leveraging the electronic texts that make up the 21st century digital library. This course provides a practical introduction to these methods in the context of a collaborative research project designed and executed by the students in the course.

Research and Collaboration:

In this course we will work collaboratively. Collaborative scholarship is common in the sciences but somewhat foreign to the humanities. That said, in the nascent discipline of "Digital Humanities" (or "Humanities Computing") where projects are often ambitions and require teams with diverse skill sets, collaboration is becoming a norm rather than an exception. See for example Lisa Spiro's recent blog posts about collaboration (Collaborative Authorship in the Humanities and Examples of Collaborative Digital Humanities Projects).


Research Project: Most of the course will revolve around the construction and execution of a collaborative research project. Students will collectively form a research question and design and implement an approach to investigate the problem. There will be various stages in the evolution of the project (see "schedule" below). The critical final deliverable will be a paper proposal which we will submit to the ADHO conference hosts for presentation. The project will account for 40% of your grade.

Exercises: A series of exercises have been prepared in order to give you hands on experience with a variety of computational methodologies for analyzing text. No prior programming experience is required or assumed. The exercises are gentle and designed to be enjoyable. Each exercise has a required component and an "extra credit" challenge question. The exercises account for 30% of your grade. 

Methodology Reports: Each student will prepare a 15 minute presentation and lead a followup discussion on a instructor-assigned topic. Typically this will involve reading and summarizing for the class some essay related to our work in the course. The presentation will account for 10% of your grade.

Collaboration: A collaborative enterprise can not succeed without commitment and participation. Attendance, punctuality, and participation in class will be expected and will constitute 10% of your grade.

Group Evaluation: Each student will write a 250-500 word evaluation of their team. This should be a sort of meta-analysis of the group dynamic: what worked well, how did the group self-organize, how was the work distributed, etc. The evaluation will account for 10% of your grade.


  1. T. 9/21: Introduction to Course: Activate your cgi space.

    Th. 9/23: History of Literary and Linguistic Computing: Readings: Horowitz, "Visualizing Big Data: Bar Charts for Words"; Anderson, "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete"; Hockey, "History of Humanities Computing"; Jockers "Beyond Busa: Humanites Computing Today."

  2. T. 9/28 Formalism: Readings: Tynjanov, "On Literary Evolution" (Walser); Ejxenbaum, "The Theory of the Formal Method"(Yamboliev); Shklovsky, "Art as Device"(Sensenbaugh)

    Th. 9/30 The Digital Library of Today and Tomorrow: Guest Lecture by Glen Worthey. Readings: Kelly, "Scan this Book"; Grafton, "Future Reading: Digitization and its Discontents"; Toobin, " Google's Moon Shoot"; Hirschorn,"The Hapless Seed"; Crane, "What Do You Do with a Million Books"; Hafner, "History Digitized (and Abridged)"

  3. T. 10/5 Literary Studies: Readings: Rommel, "Literary Studies" (Scheenstra);  Hoover, "Quantitative Analysis and Literary Studies"(Harrell); Cummings, "The Text Encoding Initiative and the Study of Literature"

    T. 10/5 Exercise One Due

    Th. 10/7 Reading: Essays by Friedlander and by Oard in the Clir Report on the Next Generation of Digital Humanities Scholarship; Ramsay, "Algorithmic Criticism" (Boesiger);

  4. T. 10/12 (CHANGE OF VENUE: MEET IN BUILDING 460-401 (THE LITERARY LAB).  Making the Most of Metadata. Moretti "Networks" and Slides

    Th. 10/14: Corpus Building and Text Encoding: Readings: Renear, "Text Encoding" (Bautista); Willet, "Electronic Texts: Audiences and Purposes"; Ide, "Preparation and Analysis of Linguistic Corpora"(Quach).  Readings: Moretti, "Style inc. Reflections on 7,000 Titles" See also (Charts); Jockers, "Beyond the Catalog: Macroanalysis Using Metadata"

    Th. 10/12 Exercise Two Due

  5. T. 10/19 Exercise Three Due

    Th. 10/21 Text Processing: Readings: Burrows, "Textual Analysis"(Wertheim); Hajic, " Linguistics Meets Exact Sciences"(Orr);

  6. T. 10/26 Exercise Four Due, Project Work Day

    Th. 10/28 Project Work Day.

  7. T. 11/2.  Exercise Five Due, Lit Lab Presentation Day.

    Th. 11/4 Project Meetings: Group 1 1.15-2:00, Group 2, 2:00-3:00

  8. T. 11/9  Exercise Six DueAuthorship Attributions and Stylistic Analysis: Readings: Craig, "Stylistic Analysis and Authorship Studies"; Jockers et. al, "Reassessing Authorship in the Book of Mormon Using Delta and Nearest Shrunken Centroid Classification"; Jockers and Witten, "A Comparative Study of Machine Learning Methods for Authorship Attribution";

    Th. 11/11 Jockers, "Beyond Style: Novel, Author, Genre, Decade, Gender";

  9. T. 11/16 Project Work Day.

    Th. 11/18 Project Work Day.

  10. T. 11/30 Project Work Day.

    Th. 12/2 Project Work Day.

Academic Computing | Stanford University | HelpSU
©2009 Stanford University