Encoding editions of documentary texts, particularly editions of correspondence,
within the Text Encoding Initiative (TEI) Guidelines raises
special challenges not encountered when editing previously published works. The
challenges fall into three broad categories: 1) difficulties in capturing
bibliographic meta-information describing the physical object and its transmission
history; 2) challenges in developing a controlled vocabulary suitable to the informal
nature of texts which were never intended for publication; and 3) difficulties in
encoding both physical characteristics of the documentary texts, as well as their
intellectual content, i.e. adopting a principle of encoding the text either as a
physical artifact or as a conceptual work. These challenges, particularly as they
relate to encoding letters, will be explored by through an edition currently being
edited entitled Thomas MacGreevy and George Yeats: A Friendship in
Letters.
During the next two years members of The Thomas MacGreevy
Archive team will be creating for online publication an edition of the
correspondence between George Yeats (1893-1968), wife of the Irish poet W.B. Yeats,
and Thomas MacGreevy (1893-1967), Irish poet, art and literary critic, and Director
of the National Gallery of Ireland (1950-63). It is a collection spanning 41 years,
comprising 148 letters. The letters are fascinating documentary records which provide
a window not only into the personal lives of the authors, but into the artistic and
political circles in which they moved, providing a unique insight into the new Irish
Free State and the cultural climate of Europe during the first half of the twentieth
century. The letters are being encoded using Extensible Markup Language (XML)
according to newly released P5 TEI Guidelines to take
advantage of the TEI’s new chapter on Manuscript Description.
Although the TEI Guidelines were not developed specifically
to encode previously published texts, many of the rules built into the syntax of the
Document Type Definitions (DTDs) favor this document type. To cite but one example,
the content model of tei.divbot
does not allow for a paragraph
<p>
element after the closer element <closer>
. While the need for additional paragraphs
after closing material in published texts may be uncommon, letters frequently have a
closing salutation, followed by a postscript. Moreover, it has proved difficult
within the TEI header to detail the type of descriptive information that editors,
scholars, and bibliographers require when engaging with handwritten documents.
Individual projects (such as DALF: Digital Archive of Letters in
Flanders Project) and subject- area consortiums (such as The Model Editions Partnership) have developed their own extensions to
the TEI Guidelines to accommodate the needs of electronic
editions of correspondence. After a brief survey of the strategies employed by these
and other editions, we will discuss how TEI’s new chapter on manuscript description
alleviates some of the problems previous projects solved with local solutions. The
chapter on Manuscript Description builds on the work of two separate initiatives
which have been recently combined: MASTER project
(1999-2001), an EU-funded project headed by Peter Robinson, and the work of the
TEI Medieval Manuscripts Description Work Group
(1998-2000), headed by Consuelo Dutschke and Ambrogio Piazzoni . The new elements
available in this tagset provide for detailed description of primary texts including
transmission, physical description, the relationship between parts of the manuscript
(for example, when a poem is enclosed with a letter), dimensions, location,
manuscript identification, provenance, and history of ownership.
Another area to be discussed is the difficulties in developing an ontology or
controlled vocabulary for a correspondence. The ontology, the backbone for the search
page, is more difficult to develop for a collection of letters than other document
types. Subject headings, such as the Library of Congress Subject
Headings (LCSH), which are used to describe
entire collections or self-contained bodies of information, are not suitable for this
project which describes each letter individually. The problem with using schemes such
as LCSH is twofold: one, the letters cover many subjects and
follow no formal organization pattern, making it difficult to make a faceted indexing
schema like LCSH worthwhile; secondly, the subject headings
were meant to be used in the cataloging of cohesive works or collections, and were
not designed to be brief entries in the index for a specific work or collection.
The indexing done for this edition more closely resembles back-of-the-book style
indexing in terms of its description of the details of the text. Standard controlled
vocabularies that might be used in this type of indexing, like the Getty Art and Architecture Thesaurus, on the other hand, are too specific
and terms do not sufficiently summarize or categorize the topics discussed.
Capturing, representing, and, indeed, interpreting a multitude of topics present in
any given letter — from general subjects to more intimate personal details
— is of paramount importance. If ontology is defined as a "formal, explicit specification of a shared conceptualization"
(Fensel 11), the burden of interpreting by a third party what a "shared conceptualization" of a text written for an intended audience of one is immense. Indeed, as the
correspondence itself often indicates, meaning is often misconstrued by the intended
recipient. Given these difficulties, other types of structured data, such as
annotation and abstracts, may be used to mitigate issues of keywords conveying
different meanings when taken out of textual context.
Another challenge when editing documentary texts for electronic publication is
choosing a philosophy by which to encode. This is particularly true in the case of
editing modern correspondence. Editors have had to traditionally decide whether the
purpose of the encoding is to capture the physical appearance of the page (regardless
of the text's logical sequence), or whether it is to record the textual/ontological
flow (regardless of the text's physical appearance). In traditional print
publications, editions (except for facsimiles) reflect a logical sequencing of the
text. For example, text which appears in the margins is placed where the editor feels
it belongs logically, even when the writing crosses page boundaries (such as
finishing a letter in the margins of the first page when the author ran out of room
on the last).
This edition is exploring methods of encoding both the physical appearance of the
page, as well as the letter’s logic. This is particularly challenging when encoding,
for example, marginalia. To represent the marginalia within the logical sequence of
the text, the editor must decide where it is to be anchored within the textual flow.
To represent it in a physical representation, the editor must provide coordinates
that will anchor the text vertically and horizontally in relation to the main body of
the work. While some of this positioning is absolute, for example, anchoring text at
the top of the page, other positioning is relative, for example, anchoring marginalia
relative to the paragraph it appears next to. While the encoding must take into
account, in some measure, the technologies available to us today, XSLT, CSS, and
JavaScript, for example, at the same time it must also be encoded with a view to
future presentations, independent of current technologies.
These are a sampling of issues that will be discussed.