Digging Into Data | The Past Speaks

Trading Consequences

14 02 2012

A few weeks ago, I mentioned on this blog that a team of historians in Canada and Scotland had won a big grant from the Digging Into Data challenge. The blog related to their project, Trading Consequences, is now up and running.

Trading Consequences will “investigate the environmental and economic histories of the rapid expansion of commodity frontiers and trade in the British Empire and Canada during the nineteenth century…This collaborative project between environmental historians in Canada and computational linguistics and computer science scholars in the UK will use text mining techniques to explore hundreds of thousands of pages of historical documents related to trade in the British Empire during the nineteenth century. Although our research will have a global scope, it will particularly emphasize the role of Canadian natural resources in the network of commodity flows.”

This project sounds fascinating and I’m certain that I will find ways to take advantage of all of the data they will be making available to the public.

Teams: York University, Canada: Prof Colin Coates (PI), Dr Jim Clifford, Prof Gillian McGillivray University of Edinburgh, UK: Prof Ewan Klein (PI), Dr Claire Grover, Dr Beatrice Alex, Dr James Reid (EDINA) University of St Andrews, UK: Prof Aaron Quigley (PI)

Canadian Economic Historians Win Big Grants From Digging Into Data

5 01 2012

Back in March 2011, I posted to this blog about Round 2 of the Digging into Data project, a joint initiative of national funding councils in several Western nations. The first round of the Digging into Data Challenge sparked enormous interest from the international research community and led to eight cutting-edge projects being funded. There has also been increased media attention to the question of so-called “big data” techniques being used for humanities and social sciences research, including a recent cover article in the journal Science.

In June 2011, teams of scholars submitted their proposals for Round 2. These teams were interdisciplinary and included academics in more than one of the participating countries.

Today, we learned that two of the winners of this hyper-competitive grant contest were teams that plan to work on topics connected to Canadian economic history.

The first of these teams is led by Canadian economic historian named Kris Inwood. Inwood’s project will involve linking together census data from Canada, the United States, and the United Kingdom from the 19^th and early 20^th centuries in ways that will allow researchers to track the migration of individuals examine the effects of economic opportunity, mobility and health on social structures in Europe and North America. The Canadian data will come from censuses 1851 between 1911.
The second of these teams includes Colin Coates of York University. Its project is called Trading Consequences.

Description: (Taken from the Digging into Data press release) This project will examine the economic and environmental consequences of commodity trading during the nineteenth century. The project team will be using information extraction techniques to study large corpora of digitized documents from the nineteenth century. This innovative digital resource will allow historians to discover novel patterns and to explore new hypotheses, both through structured query and through a variety of visualization tools.

The eight sponsoring funding bodies include the Arts & Humanities Research Council (United Kingdom), the Economic & Social Research Council (United Kingdom), the Institute of Museum and Library Services (United States), the Joint Information Systems Committee (United Kingdom), theNational Endowment for the Humanities (United States), the National Science Foundation(United States), the Netherlands Organisation for Scientific Research (Netherlands), and theSocial Sciences and Humanities Research Council (Canada).

Comments : Leave a Comment »
Tags: Digging Into Data, Kris Inwood
Categories : Uncategorized

The Promise of Digital Humanities

21 08 2011

Digital humanities (DH) is one of the most exciting fields of scholarly research right now. DH has many different aspects, but perhaps the most promising (and most discussed) is the machine analysis of text.

Proponents of data mining herald the approach for its alleged potential to close the gap between the “two cultures” of the humanities and the hard sciences by allowing us to subject historical texts to quantitative analysis. Traditionally, humanities research has been largely anecdotal, which has allowed researchers to “cherry pick” a few anecdotes to prove their pet thesis. By allowing us to survey many cases quickly, data mining can help us to determine whether the anecdotes selected by other historians were statistically representative.

Critics of data mining have argued that the massive investments in DH technology have so far produced few surprising results and that DH is all a bunch of techno-hype designed to extract funding from gullible research councils. These critics have a point: some of the recent efforts to use computers to quantitatively analyse primary sources have ended up just stating the obvious.

For instance, researchers at the University of Richmond got a grant that allowed them to analyse the hundreds of speeches delivered in Virginia at the start of the Civil War. (Read more here). Their keyword searches found that there were frequent references to “slavery” in the debates on secession. The researchers then concluded that issues related to slavery were a major motivating factor for Virginia’s decision to leave the Union. Of course, this isn’t telling us anything new. We’ve known for a long time that the American Civil War was about slavery. File this research under: “No kidding, Sherlock!”

However, one occasionally comes across a data mining project that fundamentally undermines the scholarly consensus about a particular historical topic. The New York Times recently reported on a project by William Turkel, a historian at the University of Western Ontario. He teaches Canadian history, “environmental and public history, the histories of science and technology, ‘big history’, STS, computation, and studies of place and social memory.” Turkel is something of a polymath and a few years ago he constructed a 3-D printer in his Lab for Humanistic Fabrication.

For a historian with such advanced technical skills, doing machine analysis of primary sources would be relatively easy, I would imagine.

Turkel is a member of the Criminal Intent project, which landed a grant from the prestigious Digging into Data programme. Digging into Data is jointly funded by research councils in a number of countries, including the JISC, the Arts and Humanities Research Council, the Economic and Social Research Council in the UK; the Institute of Museum and Library Services, the National Endowment for the Humanities and the National Science Foundation in the US; the Netherlands Organisation for Scientific Research; and the Social Sciences and Humanities Research Council in Canada.

Working with Tim Hitchcock of the University of Hertfordshire, Turkel recently did an analysis of the transcribed court records that had been put online by the Old Bailey Project. The Old Bailey project, which has involved digitizing and transcribing records of 198,000 trials between 1674 and 1913, is one of the best known DH initiatives. (The Old Bailey is the central criminal court in London).

Old Bailey in 1808

Old Bailey

Here is how the New York Times reported their research findings.

After scouring the 127 million words in the database for patterns in a project called Data Mining With Criminal Intent, he and William J. Turkel, a historian at the University of Western Ontario, came up with a novel discovery. Beginning in 1825 they noticed an unusual jump in the number of guilty pleas and the number of very short trials. Before then most of the accused proclaimed their innocence and received full trials. By 1850, however, one-third of all cases involved guilty pleas. Trials, with their uncertain outcomes, were gradually crowded out by a system in which defendants pleaded guilty outside of the courtroom, they said.

Conventional histories cite the mid-1700s as the turning point in the development of the modern adversarial system of justice in England and Colonial America, with defense lawyers and prosecutors facing off in court, Mr. Hitchcock and Mr. Turkel said. Their analysis tells a different story, however.

Dan Cohen, a historian of science at George Mason University and the lead United States researcher on the Criminal Intent project, found other revelations in the data. He noticed that in the 1700s there were nearly equal numbers of male and female defendants, but that a century later men outnumbered women nearly 10 to 1.

The Criminal Intent project shows that data mining can indeed advance our understanding of the past beyond what we already know from conventional historical research.

You can read Turkel’s blog here.

P.S.

UCLA recently inaugurated a programme in the digital humanities. See more here.

Comments : Leave a Comment »
Tags: Digging Into Data, digital humanities, Old Bailey Online, William J. Turkel
Categories : Uncategorized

Eight International Research Councils Announce Round Two of the Digging into Data Challenge

31 03 2011

An 8-nation agreement has produced a produced a new opportunity for scholars interested in digital humanities and social sciences called Digging Into Data. The deadline for submissions for Round 2 is June 16. I have an idea for a project that would be eligible for Digging Into Data funding, but I wouldn’t want to take the lead in writing a grant proposal. So I am going to just outline the idea here (see below) and then ask scholars who would be interested in developing the idea still further to contact me.

From the press release:

Today, eight international research funders are jointly announcing their participation in round two of the Digging into Data Challenge, a grant competition designed to spur cutting edge research in the humanities and social sciences. The Digging into Data Challenge asks researchers these provocative questions: How can we use advanced computation to change the nature of our research methods? That is, now that the objects of study for researchers in the humanities and social sciences, including books, survey data, economic data, newspapers, music, and other scholarly and scientific resources are being digitized at a huge scale, how does this change the very nature of our research? How might advanced computation and data analysis techniques help researchers use these materials to ask new questions about and gain new insights into our world?

The first round of the Digging into Data Challenge sparked enormous interest from the international research community and led to eight cutting-edge projects being funded. There has also been increased media attention to the question of so-called “big data” techniques being used for humanities and social sciences research, including a recent cover article in the journal Science.

The eight sponsoring funding bodies for Round Two of Digging into Data are:

Individual submissions are not allowed here. Teams of scholars have to submit proposals. The teams have to be interdisciplinary and include academics in more than one of the participating countries.

Final applications will be due June 16, 2011. Further information about the competition and the application process can be found at http://www.diggingintodata.org.

Ok, so here is my idea. We have pretty good records for the daily prices of government bonds traded on the London and Amsterdam exchanges for the 18th and 19th century. From the late 19th century, we have good data for the prices of corporate bonds and equities (e.g., shares of US railroads). Digitizing all of this data so that people can do really robust quantitative analysis would take lots of work, but with a big budget to hire RAs, you could do it and then put the results online. The thing I am really interested in is how information flows influenced the price of bonds: before undersea telegraphs, it took a long time for news of, say, a military reversal overseas to reach bondmarkets. There has been some great research done analysing historical bond yields and news of political and military events, but with a big dataset we could do lots more. In fact, there are probably uses of this data that haven’t even occurred to me.

Anyone interested in applying to Digging Into Data to do something along these lines is welcome to contact me. Given my other commitments, I probably couldn’t be part of the team submitting a bid, but I would love it if someone could take up this idea, since I would final the final result (the database) to be really useful. The team creating this project would likely include historians, economists with really strong stats background, and computer scientists and digital humanities experts.

I’m kinda thinking aloud here, so I apologize for this inconclusive post.

References:

Weidenmier, Marc. 2000. “The Market for Confederate Cotton Bonds”. Explorations in Economic History. 37 (1)

Frey, Bruno S., and Daniel Waldenström. 2007. Using financial markets to analyze history: the case of the Second World War. Zurich: Inst. for Empirical Research in Economics.

Comments : Leave a Comment »
Tags: and Daniel Waldenström. 2007. Using financial markets to analyze history: the case of the Second World War. Zurich: Inst. for Empirical Research in Economics., Bruno S., Bruno S. Frey, Digging Into Data, Frey, historical bond yields, Marc Weidenmier, Marc. 2000. "The Market for Confederate Cotton Bonds". Explorations in Economic History. 37 (1), Weidenmier
Categories : Uncategorized

The Past Speaks

Trading Consequences

Canadian Economic Historians Win Big Grants From Digging Into Data

The Promise of Digital Humanities

Eight International Research Councils Announce Round Two of the Digging into Data Challenge

Search

Links

Blogroll