Dataset Documentation

We built our datasets from text files provided to the public by the Text Creation Partnership (TCP). For our purposes, we used its XML files encoded with TEI P4. These texts have their own complicated histories. Files from EEBO-TCP, ECCO-TCP, and Evans-TCP were hand-keyed from digitized facsimiles.

Transmission History

The digitized facsimiles used for hand-keying are from several microfilm efforts. In 1938, University Microfilms International began microfilming a cultural preservation project titled Early English Books (EEB I). The books selected for microfilm are those listed in A. W. Pollard and G. R. Redgrave's Short-Title Catalogue, a bibliography of English-printed books from 1473 to 1640. The microfilm project Early English Books II (EEB II) includes books listed in Donald Wing's Short-Title Catalogue, which continues Pollard and Redgrave's efforts by cataloguing English-printed books from 1641 to 1700. The Eighteenth Century Microfilm Collection emerged in tandem with the Eighteenth-Century Short-Title Catalogue, an electronic finding aid that records microfilm location for individual texts. It focused on items printed in English from 1701-1800. The Evans American Imprints Series are based on Charles Evans's American Bibliography and Ralph R. Shaw and Richard H. Shoemaker's American Bibliography.

TCP Files and Selection Criteria

The TCP's goal is to provide standardized, XML-encoded electronic text editions of early printed books. It has partnered with institutions and several publishing and information-content/technology companies. The TCP collaborates with ProQuest, Gale, and NewsBank to make early printed books accessible. By providing electronic editions, the TCP makes these early books searchable due to the problem they present for machine-encoding text technologies, like optical character recognition (OCR). For an account of working with the textual features of early modern printed texts, visit our project challenges page.

 

TCP COLLECTION NUMBER OF FILES PUBLIC RELEASE DATE
EEBO-TCP I 25,368 January 1, 2015
EEBO-TCP II 28,466 5 years from completion date
ECCO-TCP 2,473 April 25, 2011
Evans-TCP 4,977 June 30, 2014

 

EEBO-TCP Phase I: Selected first editions and works listed in the New Cambridge Bibliography of English Literature (NCBEL).

EEBO-TCP Phase II: Aims to provide an edition of each unique work in EEBO.

ECCO-TCP: Focused heavily on works by authors whose works span the divide between the 17th and 18th centuries, to produce continuity with EBBO-TCP. Works that didn't respond well to OCR.

Evans-TCP: Most studied texts from the Evans bibliography, identified by the American Antiquarian Society (AAS).