Linkable Data, Linked Data, Text Encoding and the Need for Well-Defined Conceptual Models in the Digital Humanities (workshop)
Christian-Emil Ore* Christian-Emil Ore is an associate professor and head of Unit for Digital Documentation (EDD) /Department for Linguistic and Scandinavian at the University of Oslo and has worked with digital methods in the humanities for 25 years: Methods for cultural heritage documentation, (e)-lexicography & corpus and text encoding/electronic text editions. He has participated in and coordinated long term language documentation projects in Norway and in Southern Africa, served on scientific and advisory boards in US, Germany and Scandinavia, chaired ICOM-CIDOC (2004–2010), co-chaired TEI ontology SIG and participated in the development of CIDOC-CRM and FRBoo since 2002.
1. Background
1Since the mid-1990s there has been an increase in the interest for the design and use of conceptual models (ontologies) in library science as well as in Digital Humanities. In the text-oriented Digital Humanities, however, conceptual models and ontologies have been considered to be closer to database development than to text research. This was the prevailing view in the TEI community until recently. The introduction of Linked Data 8 years ago (Berners-Lee 2009) has put more focus on what we may call “real world information” and how such information can be found in and extracted from textual resources.
2Reproducibility of results is a core concept in text-based research as in all research. The content in information systems and virtual reconstructions in the cultural heritage sector are to a large degree directly based on information deduced from text studies. In many cases the links from the information system back to the texts are not available, and such links may be difficult to re-establish. Even if it is possible to re-establish them, the process may be too expensive. These links are necessary to enable reproducibility of the deduction, since they document how the conclusions are based on the texts.
3Linked Data offers a simple and easy way to publish data in an open and uniform interface enabling others to link scholarly data resources. Thus Linked Data should be ideal for building resources in the Digital Humanities (Ore 1998).
4The programmatic slogan of the Semantic Web and Linked Data community is: “Anyone can say anything about anything.” That is, anything can be linked. From a scholarly and scientific point of view this is not satisfactory. Information is generated through exclusion using meaningful distinctions according to a common conceptual model or formal ontology. Thus meaningful information integration in a scholarly field using the Linked Data mechanism requires a common conceptual model for the context in question.
5How should structured information, based on a reading of a text, be linked to the encoded text itself? It is important to base such linking on data standards evolved in the fields of text encoding and conceptual modelling. Thus, the understanding of text encoding represented by the TEI guidelines and the understanding of conceptual models represented by initiatives like the CIDOC CRM and FRBRoo should be combined.
6A conceptual model or ontology is not a specification for a technical implementation, nor is it a closed vocabulary or a thesaurus. It should be the result of a conceptualisation of a domain and a result of ontological commitments based on this analysis and is usually expressed as a hierarchy of concepts connected with properties or relationships. There are some important principles which should be observed. First of all, the model should follow the open-world assumption.1 Secondly, the modelling process should be bottom up, that is, starting with the empirical data. Finally, intension of the classes or concepts in the model should focus on identity, substance, unity and existence.
2. Workshop Outline
7The workshop is divided into four main parts
- Introduction to conceptual modelling and ontologies, Linked/able Data and encoded texts.
- Event oriented modelling and data integration. An introduction to CIDOC-CRM (ISO211/27): background, purpose, design principles.
- A short introduction to the family of CRM-extensions and especially the FRBRoo, an object oriented version of the library model FRBR. FRBRoo is a more detailed model for intellectual works and can be used for modelling metadata visual and performing arts, some examples.
- Mapping data, the tool 3M and the format X3ML will be used as an example. The tool was originally developed for mapping data to the EDM (Europeana Data Model) and further refined as a part of a British, Swedish and Greek project and is now maintained in close connection to the CIDOC-CRM SIG.
8The workshop is intended to be a tutorial with an active conversation between the participants and the workshop leader. It will not include practical hands-on exercises.
3. Selected Readings
- Berners-Lee, Tim, James Hendler and Ora Lassila. 2001. “The Semantic Web.” Scientific American, 284 (May): 34–43. doi:10.1038/scientificamerican0501-34.
- Bush, Vannevar. 1945. “As We May Think.” The Atlantic. July 1945.
- CIDOC-CRM. Available at http://www.cidoc-crm.org.
- Conklin, J. 1987. “Hypertext: An introduction and survey.” Computer. 20 (9): 17–42.
- Hyvonen, E., J. Tuominen, M. Alonen, and E. Makela. 2014. “Linked data Finland: A 7-star model and platform for publishing and re-using linked datasets.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8798: 226–230. https://link.springer.com/chapter/10.1007/978-3-319-11955-7_24. 10.1007/978-3-319-11955-7_24.
- Marketakis, Yannis, Nikos Minadakis, Haridimos Kondylakis, Konstantina Konsolaki, Georgios Samaritakis, Maria Theodoridou, Giorgos Flouris, and Martin Doerr. 2016. “X3ML mapping framework for information integration in cultural heritage and beyond.” International Journal on Digital Libraries. 1: 1–19. https://link.springer.com/article/10.1007/s00799-016-0179-1/fulltext.html. 10.1007/s00799-016-0179-1.
- Oldman, Dominic, Martin Doerr, and Stefan Gradmann 2016. “Zen and the art of Linked Data: new strategies for a Semantic Web of humanist knowledge.” A new companion to digital humanities, edited by Schreibman, Susan, Raymond George Siemens, and John Unsworth: 251–273. Chichester, UK: John Willey & Sons, Ltd. 10.1002/9781118680605.ch18.
- Ore, C.-E, and O. Eide. 2009. “TEI and cultural heritage ontologies: Exchange of information?” Literary and Linguistic Computing. 24 (2): 161–72.
- Richie, Ian. 2011. “The day I turned down Tim Berners-Lee.” Available at: https://www.ted.com/talks/ian_ritchie_the_day_i_turned_down_tim_berners_lee/transcript?language=en .
Bibliography
- Berners-Lee, Tim. 2009. “Linked Data.” Available at https://www.w3.org/DesignIssues/LinkedData.html.
- Ore, Christian-Emil. 1998. “Making multidisciplinary resources.” The Digital Demotic. Selected papers from DRH97, Digital Resources for the Humanities conference, St Anne’s college, Oxford, September 1997, edited by Burnard, Lou, Marilyn Deegan, and Harold Short: 65–74. London: Office for Humanities Communication.