TEI 2017 Victoria, British Columbia, Canada November 11 - 15

XML Sun Nov 12, 09:00–16:30

Linkable Data, Linked Data, Text Encoding and the Need for Well-Defined Conceptual Models in the Digital Humanities (workshop)

Christian-Emil Ore* Christian-Emil Ore is an associate professor and head of Unit for Digital Documentation (EDD) /Department for Linguistic and Scandinavian at the University of Oslo and has worked with digital methods in the humanities for 25 years: Methods for cultural heritage documentation, (e)-lexicography & corpus and text encoding/electronic text editions. He has participated in and coordinated long term language documentation projects in Norway and in Southern Africa, served on scientific and advisory boards in US, Germany and Scandinavia, chaired ICOM-CIDOC (2004–2010), co-chaired TEI ontology SIG and participated in the development of CIDOC-CRM and FRBoo since 2002.

1. Background

1Since the mid-1990s there has been an increase in the interest for the design and use of conceptual models (ontologies) in library science as well as in Digital Humanities. In the text-oriented Digital Humanities, however, conceptual models and ontologies have been considered to be closer to database development than to text research. This was the prevailing view in the TEI community until recently. The introduction of Linked Data 8 years ago (Berners-Lee 2009) has put more focus on what we may call “real world information” and how such information can be found in and extracted from textual resources.
2Reproducibility of results is a core concept in text-based research as in all research. The content in information systems and virtual reconstructions in the cultural heritage sector are to a large degree directly based on information deduced from text studies. In many cases the links from the information system back to the texts are not available, and such links may be difficult to re-establish. Even if it is possible to re-establish them, the process may be too expensive. These links are necessary to enable reproducibility of the deduction, since they document how the conclusions are based on the texts.
3Linked Data offers a simple and easy way to publish data in an open and uniform interface enabling others to link scholarly data resources. Thus Linked Data should be ideal for building resources in the Digital Humanities (Ore 1998).
4The programmatic slogan of the Semantic Web and Linked Data community is: “Anyone can say anything about anything.” That is, anything can be linked. From a scholarly and scientific point of view this is not satisfactory. Information is generated through exclusion using meaningful distinctions according to a common conceptual model or formal ontology. Thus meaningful information integration in a scholarly field using the Linked Data mechanism requires a common conceptual model for the context in question.
5How should structured information, based on a reading of a text, be linked to the encoded text itself? It is important to base such linking on data standards evolved in the fields of text encoding and conceptual modelling. Thus, the understanding of text encoding represented by the TEI guidelines and the understanding of conceptual models represented by initiatives like the CIDOC CRM and FRBRoo should be combined.
6A conceptual model or ontology is not a specification for a technical implementation, nor is it a closed vocabulary or a thesaurus. It should be the result of a conceptualisation of a domain and a result of ontological commitments based on this analysis and is usually expressed as a hierarchy of concepts connected with properties or relationships. There are some important principles which should be observed. First of all, the model should follow the open-world assumption.1 Secondly, the modelling process should be bottom up, that is, starting with the empirical data. Finally, intension of the classes or concepts in the model should focus on identity, substance, unity and existence.

2. Workshop Outline

7The workshop is divided into four main parts
  1. Introduction to conceptual modelling and ontologies, Linked/able Data and encoded texts.
  2. Event oriented modelling and data integration. An introduction to CIDOC-CRM (ISO211/27): background, purpose, design principles.
  3. A short introduction to the family of CRM-extensions and especially the FRBRoo, an object oriented version of the library model FRBR. FRBRoo is a more detailed model for intellectual works and can be used for modelling metadata visual and performing arts, some examples.
  4. Mapping data, the tool 3M and the format X3ML will be used as an example. The tool was originally developed for mapping data to the EDM (Europeana Data Model) and further refined as a part of a British, Swedish and Greek project and is now maintained in close connection to the CIDOC-CRM SIG.
8The workshop is intended to be a tutorial with an active conversation between the participants and the workshop leader. It will not include practical hands-on exercises.

