- Introduction
- Exhibition
- An Example
- Why Exhibition is a Problem for Modeling
- Relevant Work
- False Resolutions
- Conclusion
There has recently been increased interest within the humanities computing community in formalizing the 'semantics' of document markup (e.g., Sperberg-McQueen et al., Buzzetti, Witt, Renear et al. 2002, Bayerl et al., Dubin et al., Sasaki), or, in an alternative characterization, developing 'conceptual models' that generalize the representation of the textual structures (Cover). We endorse this agenda, which has been a long time in the making (Raymond et al.), but we also wish to draw attention to some difficulties that may be unique to cultural material.
Natural human communication is characterized by what might be termed plenary semiosis. Without waiting for formal languages to be provided humans immediately proceed to attempt to say everything they think, and, at least arguably, they generally succeed. The result is that natural communication systems exhibit every imaginable feature that troubles knowledge engineers: fuzzy predicates, modal notions, non-extensional contexts, incompleteness, inconsistency, ambiguity, and so on. But there is an additional complexity as well. Human communication takes place within social contexts that, as linguists and philosophers have been telling us for some time, confound efforts to conceptualize it as sets of assertions only. These two aspects of human communication, plenary semiosis and multiple interacting levels of non-assertional representation combine to produce some of the most difficult, and significant, features of communicative artifacts. We describe one of these features and argue that unless current conceptual modeling systems are extended to accommodate this and other related features those systems will be inadequate for the representation of cultural objects.
In ordinary linguistic communication we often use a name to refer to something in order to then go on to attribute some property to that thing. However when we do this we do not naturally construe our linguistic behavior as being at the same time an assertion that the thing in question has that name. We do however have a particular cognitive relationship to this latter state-of-affairs; it is just that this attitude is not one of assertion — we rely on, or are committed to, or presuppose that the thing in question has the name we are using to refer to it, but we are not asserting that it does.
We refer to this relationship as exhibition. We say that the brief document/utterance "Moby Dick was written by Herman Melville"exhibits the state of affairs that "the name of the author of Moby Dick is 'Herman Melville'", but it does not assert that state of affairs. What it does assert is that Melville is the author of Moby Dick. Although naming is our prototypical example of exhibition in this paper, we believe that exhibition is a widespread and diverse phenomenon.
Consider this XML markup, adapted from the TEI Guidelines (P4):
<bibl>
<author>Edward R. Tufte</author>
<title>Envisioning Information</title>
<pubPlace>Cheshire, Conn.</pubPlace>
<publisher>Graphics Press</publisher>
</bibl>
The
Guidelines characterize these element types as follows:
-
author
: "... contains the name of the author(s), personal or corporate, of a work ...".
-
title
: "... contains the title of a work ...".
-
publisher
: "...provides the name of the organization responsible for the publication ... of a bibliographic item".
-
pubPlace
: "contains the name of the place where a bibliographic item was published."
Close reading of these definitions reveals that these markup tags convey two quite different sorts of information:
Set A
- Edward R. Tufte authored Envisioning Information.
-
Envisioning Information was published by Graphics Press.
-
Envisioning Information was published in Cheshire, Connecticut.
Set B
- The name of the author of [this book] is "Edward R. Tufte".
- The name of the publisher of [this book] is "Graphics Press".
- The name of the place where [this book] was published is "Cheshire, Connecticut".
First, note that the markup is overloaded. The markup tag author
is used to say that something is a name, and it is also used to say that someone is an author (or the author of a particular book). Consider a representation in any commonly used data modeling language, say, RDF's graph-based representation: nodes for individual entities and arcs for binary relationships between them. We would expect a single arc for the assertion represented by a single element — but here apparently a single element must be unpacked into two arcs. TEI specialists are fond of saying that TEI markup is about the text, not about the world the text is about; but we see plainly this isn't always so. And we also note that this overloading crosses a profound and famously troublesome semantic boundary: that between using an expression and mentioning it.
But the most revealing feature of this analysis is that when we take the union of assertions in sets A and B we will have a model of possible semantic content that, as a whole, is almost certainly incorrect; at least in this sense: it is unlikely that there is any single communicative object whose semantics is correctly modeled by this set of assertions. There are two cases to consider; we present them using some terminology ('expression', 'work') from the Functional Requirements for Bibliographic Records (IFLA 1997) and distinguish two senses of 'XML document' (Renear et al. 2003).
- Consider first an XML document that is understood to be a symbolic expression realizing an intellectual work such as, say, a manual about web design. Such a document will be correctly understood as making the assertions in Set A, but not as asserting any of the assertions in Set B.
- Now consider an XML document that is a transcription of a source text, that is, a document that is an expression realizing a work which is itself a "theory of the text"
(Sperberg-McQueen); that text (expression) being the text of the manual (a work). Such a document would generally be understood as making the assertions in Set B, but not as asserting any of the assertions in Set A.
As a consequence a correct graph model for either case cannot represent the assertions (as assertions) in the other case. However a correct representation of Case 1 could represent Case 2 assertions as exhibitions — if specific expressive devices, qualified arcs say, were available for this representation. This is the extension that we are recommending.
Some clarification of these intricacies may be useful. First note that the cases are not isomorphic: Case 1 asserts the propositions in Set A and exhibits those in Set B, but Case 2, although asserting the propositions in Set B, does not exhibit the propositions in Set A. While might be plausibly argued that the propositions in Set B logically imply those in Set A, and so any document that asserts Set B asserts Set A, we would resist this for two reasons: first because the intuitive logic of assertion simply does not seem to require that all logical implications of asserted propositions are themselves asserted; and second because we suspect that a completely correct presentation of Set B, one more in line with TEI doctrine on the textual orientation of markup, would eliminate all commitments to books, authors, and authorship, and that paraphrase would block the logical implications in any case. What one could say however is that in Case 2 the Set A propositions occur in oratio obliqua.
We also note, as an illustration of the usefulness of the concept of exhibition, that scholarly transcription into TEI markup can be understood as identifying exhibitions and then re-expressing them as assertions.
The rudiments of this problem have already made an appearance in the Semantic Web and Dublin Core communities. However we do not think its significance, at least for cultural material involving human communication, is fully appreciated. Dan Brickley, chair of the W3C Semantic Web Interest Group has noted that the Dublin Core dc:creator
element is defined in a way that encourages a similar confusion between names and things (Brickley), not surprisingly, as the definition of dc:creator
is similar in logical structure to the ones we cite from the Guidelines:
"Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity."
(DCMES 2003)
The varying usage of the dc:creator
code (sometimes for Creators, sometimes for their names) amongst metadata encoders is now recognized as a serious practical problem for the development of an 'abstract model' for Dublin Core. (Powell).
It is of course in the context of efforts to be absolutely precise and formal that the problem is acute. Jeremy Carroll, co-editor of the W3C Recommendation Resource Description Framework (RDF): Concepts and Abstract Syntax, writes in a posting on w3c-rdfcore-wg:
I have been looking through the (RDF) primer, particularly looking
at the Dublin Core examples (throughout the primer).
These seem like perfectly fair examples of how Dublin Core is used. Unfortunately, there are many instances where strings are used to represent people and things rather than themselves. This is not in agreement with the model theory...
(Carroll)
Carroll then goes on to note that given the RDF model theory incorrect implications immediately ensue: in our example for instance, that Moby Dick was authored by a string rather than by a person.
Three deflationary perspectives on this problem are possible.
One, anticipated from the TEI community, is that TEI encoding always represents the features of the linguistic text only and 'real-world' assertions are either misunderstandings, mistakes, or anomalies. This may be so, although we are skeptical as to whether this stance can be maintained with respect to the full range of TEI applications. But in any event exhibition remains a common feature of communicative artifacts, characteristic of many XML element sets, and of many other systems of symbolic communication. It must be accommodated.
Another approach, this one anticipated from the Semantic Web community, is simply to insist on an unambiguous corrected conceptual representation: one arc for being named "Herman Melville", one for authoring Moby Dick. But this resolution fails for the reasons presented in the preceding section. Although this model would be in some sense an accurate representation of "how the world is" according to the document, it would not represent what is asserted by the document. The authorship arc in the corrected RDF graph model will correspond to relationships of exhibition, not assertion; and there is no accommodation for this distinction in the modeling language.
Finally, it is also natural to feel that the phenomenon of exhibition is similar in some respects to the already noted much studied phenomenon of linguistic presupposition and to wonder whether exhibition is simply a special case of presupposition (Levinson). Currently we are undecided on this issue but we note that even if exhibition does turn out to be a form of presupposition that would remove neither the difficulty exhibition creates for conceptual modeling, nor its intellectual significance. In fact it would be a rather substantial finding to determine the matter one way or the other.
The phenomenon of exhibition is not limited to the simple naming examples used above. We believe it is characteristic of communication and communicative cultural artifacts in general. For instance when we title our articles we do not say that the title is a title, although we exhibit it as a title, allowing that inference to be drawn (Renear). Or for a quite different sort of case: consider how morphological distinctions exhibit our commitments to syntactical roles, without actually asserting that the words in question are playing those roles — though indeed we use those words with those particular grammatical and syntactical properties in order to make the assertions we do make.
We conclude that current conceptual modeling projects within the humanities computing community will fail to be adequate for the study of cultural objects if they take the approach of the Semantic Web community and see exhibition as a simple problem of ambiguity or error, rather than defining new constructs to express these distinctive relationships. To be adequate for the humanities, conceptual modeling must be extended to accommodate the data of the humanities.