Scripto: New Open Source Software for Creating Crowdsourced Transcription Websites

1 01 2011

I have written before about crowdsourcing the transcription of primary sources. I have posted before about Transcribe Bentham. I would now like to bring your attention to Scripto, new open-source software that allows archives and libraries to crowdsource the transcription of archival materials. This information was sent to me by Prof. Sharon Leon of George Mason University, the head of the project and I am taking the liberty of re-posting it here.

The lead programmer for Scripto is Jim Safley, who is Web Programmer and Digital Archivist for the Center. He received his undergraduate degree in history at GMU and is currently working towards his master’s degree in American history. Beginning his archiving career in 1999 at the National Archives and Records Administration, Jim moved through several related positions, including records manager at Phi Beta Kappa national headquarters and archivist assistant at GMU’s Special Collections and Archives. Arriving at CHNM in 2002, Safley applied his traditional archiving experience to his work in digital archiving, web programming, and database administration. His interests include metadata standards, database design, web technologies, progressive history and history of technology. Safley was involved in developing that September 11 Digital Archive.

The Scriptio software is currently being developed the Center for History and New Media at George Mason University for its transcription of the Papers of the War Department, 1784-1800 project. The software will then be made available for others to use and, if they wish, modify. They will be launching the tool to allow for crowdsourcing of transcription toward the end of January.  After that point, they will begin work on writing connector scripts for the tool so that it can be used with common content management systems (Omeka, Drupal, WordPress, etc.).

Scripto uses the wikimedia api and editing interface and some additional scripting to capture the transcriptions and pass them back to the CMS.  Thus, it provides for all of the versioning and notation capacities of wikimedia, but makes the current version of the transcription available to the main CMS for search and association with the rest of the standardized archival metadata.  This is one of the differences between Scripto and the system that Transcribe Bentham is using; the Bentham project is totally contained within the wikimedia interface and has no way to export standardized metadata.  Additionally, the Transcribe Bentham project has created an interface for TEI mark-up (Text Encoding Initiative) of the texts.  The people at the Scripto project  have not added this modification to their use of wikimedia, but since the tool is open source, another programmer could add that modification on a individual basis or could release a plugin for our system.

Primary Sources Are Going Online

29 11 2010


Library and Archives Canada Building, 395 Wellington Street, Ottawa, Ontario

Library and Archives Canada recently completed the digitization of the papers of Sir John A. Macdonald.

Macdonald in 1883. Image from LAC. Mikan: 3218716


"Come Into My Office" Image of the Office of Sir John A. Macdonald

Previously, scholars wishing to look at the correspondence of Macdonald had to look a microfilms of the originals. There is now a database online that allows you to download images of the correspondence in PDF format.

The search engine for the Macdonald correspondence looks like this:

I have pasted an image of an actual document in the Macdonald correspondence below. In this case, it is a rare letter that Laurier sent to Macdonald.

Laurier to Macdonald, 7 February 1884

LAC’s wonderful decision to put the Macdonald papers online is part of a growing trend to digitize primary sources and place them online. The Library of Congress has put Abraham Lincoln’s Papers online. See here.

The wonderful thing about the LoC’s Lincoln Papers search engine is that you can view both images of the primary sources as well as plain text transcriptions of each item of correspondence. For instance, I found this letter from a private citizen in Canada to Lincoln dated 25 Feb 1863.

Here is the transcription of the letter, which was completed the folks at the Lincoln Studies Center, Knox College. Galesburg, Illinois.

P. Tertius Kempson to Abraham Lincoln, Wednesday, February 25, 1863 (Support and autograph request from Canada; endorsed by Elbridge G. Spaulding)

From P. Tertius Kempson to Abraham Lincoln, February 25, 1863

Fort Erie C. W.

Feby 25th 1863.

Honoured Sir,

Englishmen and Canadians are charged that their sympathies have been with the Southern Rebellion and Slavery and my cheeks flush with shame for my countrymen, when I own that this has been too much the case– Thank God, there are numerous glorious exceptions and as a proof of this I take the liberty of sending you a Copy of a Speech delivered recently by the foremost man in Canada and I am happy in being able to assure you that it contains the sentiments and views of thousands of Canadians and millions of British Subjects;

Yes! honoured Sir, you have our earnest and most constant prayers that you may entirely succeed in ridding the Great and Glorious Union of the foul Canker worm of Slavery.

I had the honour and happiness of a personal introduction to you when you passed through Buffalo; May I ask you to enable me to perpetuate the remembrance of yourself and the honour I then enjoyed by giving me a line or two in autograph that I may be able to leave to my children & my childrens children, as a heir loom in remembrance of the great apostle of Liberty of the 19th Century–

By confering upon me this small favor, I shall ever be yours most respectfully & gratefully

P. Tertius Kempson

Another wonderful recent initiative is the Transcribe Bentham project, which seeks to transcribe the papers of Jeremy Bentham, the great philosopher. In this case, the transcription is being done by crowdsourcing. Image of all of the correspondence in the Bentham collection was placed online on a website that allow interested members of the public to try their hands at transcribing the documents. The results are monitored by trained archivists and paleographers to maintain quality control.

Transcribe Bentham Project

Transcription in a Digital World

28 09 2010

I would like to bring your attention to an excellent post on the ActiveHistory blog about digital transcription. The post is by Krista McCracken, a public history consultant and who is currently working as a Digitization Facilitator for Knowledge Ontario. She begins her post with this:

“You are cleaning out the attic of your house and find a diary from the early 1900s written by a distant relative.  What do you do with the diary? How do you make it useful to the general public? Donating it to a museum or an archive is a good start.  However, in order for the diary to be useful to a wider audience it needs to be transcribed.  A transcribed document can be made full text searchable, copies can be made of the text, and the entire document becomes accessible to a wider audience.  Transcription can be a time consuming and a painstaking process.   But, once a document has been transcribed its usefulness increases exponentially.”

Krista shares some interesting information about how Optical Character Recognition has facilitated digital transcription. She also includes information about crowdsourcing, which is the strategy of outsourcing tasks, traditionally performed by an employee or contractor, to a large group of unpaid volunteers, through an open call. Krista tell us about the Bentham Project, which allows interested members of the public to try their hands at transcribing scanned images of the great intellectual’s correspondence.  In an effort to harness the spirit of competition to generate lots of  quality transcriptions, the Bentham Project awards points to the best transcribers.

According to the Bentham Project’s blog post of 22 September 2010, the top transcriber was “currently Maureencallahan who has already racked up 2700 points for her contributions! Snefnug and Auto-icon are in joint second place with 2600 points.”

As someone who is helping to plan something similar to the Bentham Project for a major Canadian historical figure whose career was spent largely in the pre-typewriter age, I was very interested to read Krista’s informative post.

More details of the Canadian project will appear on this blog at a later date.

Update 1: Check out these blog posts about crowdsourcing and digital transcription. Here and here.

Update 2:

Checkout out the blog of the Transcribe Bentham crowdsourcing project. They have some great images there that allow potential volunteers to get a sense of what the various stages of the project.

Digitisation of Bentham's Correspondence