Data Projects

Opening up historical data

Historical datasets that I’ve created and shared. Some have been created as part of personal research projects, often to enhance/add structure to existing data; others have come out of larger project collaborations.

London Lives Petitions 1690-1800

Metadata and texts for about 10,000 eighteenth-century petitions addressed to London magistrates. The sources were originally digitised as part of the London Lives project; I identified the petitions from about 100,000 pages of heterogeneous documents using regex-based search strategies based on the distinctive textual features of petitions, and used the London Lives tagging to find petitioner names. (2015)

LLPP »

London Lives Pauper Examinations 1740-1800

Dataset of about 28000 individuals in 10700 eighteenth-century pauper examinations from two London parishes, with a smaller supplementary dataset of removal orders. The examinations had been digitised as part of London Lives; creation of the dataset involved correctly separating individual examinations based on their text patterns, using the LL name tagging to find people, identifying name roles and deduplicating mentions of names. (2016)

LLEP »

London Lives Coroners’ Inquests

Data for 2894 Westminster inquests, including dates, places, names of the deceased, verdicts and causes of death, and texts of the formal inquisitions in a series of inquests digitised for London Lives. Partly based on an existing catalogue created by Tim Hitchcock, data creation involved identifying individual cases and extracting information from them based on the standardised format of an early modern inquisition. (2018)

Coroners »

Old Bailey Voices 1780-1880

Text corpus and summary data for 21000 trials reported in the Proceedings between 1780 and 1880, created for the Digital Panopticon. A “remix” of two existing datasets: the Old Bailey Proceedings and the Old Bailey Corpus, a subset of the OBP data enhanced with linguistic markup. The remix was carried out to improve the accuracy of OBC’s speaker role markup, link individual defendants with their words and trial outcomes, and convert the XML to a tabular format. (2018)

OBV »

Old Bailey Prisoner Defences 1751-1900

Corpus of all prisoner defence statements that could be identified in trials in the Old Bailey Proceedings between 1751 and 1900, with metadata about the trial, defendant(s), offence(s) and outcome(s). The Old Bailey Corpus data was used to help understand typical text patterns for such statements so that they could be extracted from the full OBP dataset. (2018)

Defences »

Home Office Criminal Registers 1791-1802

Dataset created for the Digital Panopticon from registers of prisoners in Newgate Prison awaiting trial, mainly at the Old Bailey. Originally digitised by London Lives, dataset creation was a collaborative process: following work by DHI staff to automatically add structural markup which was manually checked and corrected by DP researchers, I cleaned up and standardised prisoner names, ages, heights, trial dates, prison and court names, offences and outcomes. (2018)

HCR »

Middlesex/Westmr Calendars of Prisoners 1836-1889

Dataset created for the Digital Panopticon project from printed lists of prisoners tried at Middlesex or Westminster Sessions of the Peace; includes the name of the prisoner, age, occupation, literacy, previous convictions, offence, trial and outcome. Images of the calendars were transcribed using OCR by a subcontractor; my work including cleaning the dirty OCR, conversion to one prisoner per row, and extraction of ages, dates, sentences, previous offences, etc into structured fields. (2017)

CPM »

Metropolitan Police Register of Habitual Criminals 1881-1925

Dataset created for the Digital Panopticon from registers of “habitual” criminals recorded on their release from prisons in England and Wales between 1881 and 1925. Includes the name of the prisoner, year and place of birth, height, physical appearance, occupation, conviction and sentence details, previous convictions, prison and intended destination on liberation. Again this was created from OCR transcriptions, using similar methods to the Middlesex Calendars of Prisoners. (2018)

RHC »

Middlesex Convicts Delivered For Transportation 1785-92

Data for 1515 offenders convicted at the Old Bailey and Middlesex Sessions and conveyed to various ships for transportation to Australia. Created from the XML for Order Books digitised by London Lives; I added structure for names, dates and ships. (2015)

CDT »

York Defamation Causes 1660-1700

Dataset of 107 defamation causes in the city of York, 1660-1700, based on a handlist and transcriptions I created in 1999 during archival research for my MA thesis and more recently checked and updated using the online York Cause Papers database. Includes names of plaintiffs and defendants, dates, info about linked cases, defamatory words and types. (2014)

YCP »