Digital humanities (DH) is one of the most exciting fields of scholarly research right now. DH has many different aspects, but perhaps the most promising (and most discussed) is the machine analysis of text.
Proponents of data mining herald the approach for its alleged potential to close the gap between the “two cultures” of the humanities and the hard sciences by allowing us to subject historical texts to quantitative analysis. Traditionally, humanities research has been largely anecdotal, which has allowed researchers to “cherry pick” a few anecdotes to prove their pet thesis. By allowing us to survey many cases quickly, data mining can help us to determine whether the anecdotes selected by other historians were statistically representative.
Critics of data mining have argued that the massive investments in DH technology have so far produced few surprising results and that DH is all a bunch of techno-hype designed to extract funding from gullible research councils. These critics have a point: some of the recent efforts to use computers to quantitatively analyse primary sources have ended up just stating the obvious.
For instance, researchers at the University of Richmond got a grant that allowed them to analyse the hundreds of speeches delivered in Virginia at the start of the Civil War. (Read more here). Their keyword searches found that there were frequent references to “slavery” in the debates on secession. The researchers then concluded that issues related to slavery were a major motivating factor for Virginia’s decision to leave the Union. Of course, this isn’t telling us anything new. We’ve known for a long time that the American Civil War was about slavery. File this research under: “No kidding, Sherlock!”
However, one occasionally comes across a data mining project that fundamentally undermines the scholarly consensus about a particular historical topic. The New York Times recently reported on a project by William Turkel, a historian at the University of Western Ontario. He teaches Canadian history, “environmental and public history, the histories of science and technology, ‘big history’, STS, computation, and studies of place and social memory.” Turkel is something of a polymath and a few years ago he constructed a 3-D printer in his Lab for Humanistic Fabrication.
For a historian with such advanced technical skills, doing machine analysis of primary sources would be relatively easy, I would imagine.
Turkel is a member of the Criminal Intent project, which landed a grant from the prestigious Digging into Data programme. Digging into Data is jointly funded by research councils in a number of countries, including the JISC, the Arts and Humanities Research Council, the Economic and Social Research Council in the UK; the Institute of Museum and Library Services, the National Endowment for the Humanities and the National Science Foundation in the US; the Netherlands Organisation for Scientific Research; and the Social Sciences and Humanities Research Council in Canada.
Working with Tim Hitchcock of the University of Hertfordshire, Turkel recently did an analysis of the transcribed court records that had been put online by the Old Bailey Project. The Old Bailey project, which has involved digitizing and transcribing records of 198,000 trials between 1674 and 1913, is one of the best known DH initiatives. (The Old Bailey is the central criminal court in London).
Old Bailey in 1808
Here is how the New York Times reported their research findings.
After scouring the 127 million words in the database for patterns in a project called Data Mining With Criminal Intent, he and William J. Turkel, a historian at the University of Western Ontario, came up with a novel discovery. Beginning in 1825 they noticed an unusual jump in the number of guilty pleas and the number of very short trials. Before then most of the accused proclaimed their innocence and received full trials. By 1850, however, one-third of all cases involved guilty pleas. Trials, with their uncertain outcomes, were gradually crowded out by a system in which defendants pleaded guilty outside of the courtroom, they said.
Conventional histories cite the mid-1700s as the turning point in the development of the modern adversarial system of justice in England and Colonial America, with defense lawyers and prosecutors facing off in court, Mr. Hitchcock and Mr. Turkel said. Their analysis tells a different story, however.
Dan Cohen, a historian of science at George Mason University and the lead United States researcher on the Criminal Intent project, found other revelations in the data. He noticed that in the 1700s there were nearly equal numbers of male and female defendants, but that a century later men outnumbered women nearly 10 to 1.
The Criminal Intent project shows that data mining can indeed advance our understanding of the past beyond what we already know from conventional historical research.
You can read Turkel’s blog here.
UCLA recently inaugurated a programme in the digital humanities. See more here.