TEI 2017 Victoria, British Columbia, Canada November 11 - 15

XML Tues Nov 14, 14:10–15:30

How to Stop Global Warming by Means of TEI, or: Saving the rainforest by building a digital book of abstracts (poster)

Peter Andorfer* Peter Andorfer studied history at Innsbruck University, where he finished a PhD in history with a thesis on the works of the Tyrolean peasant Leonhard Millinger (1753–1834). During an extended research period at the Herzog August Bibliothek in Wolfenbüttel (Lower Saxony, Germany), financed by a “Digital-Humanities Scholarship,” he published an online edition of Millinger’s main work The Depiction of the World. He has also worked on the topics “research data” and “scientific collections” in DARIAH-DE and maintains the webpage www.digital-archiv.at for developing and deploying different kinds of DH-projects. and Vanessa Hannesschläger* Vanessa Hannesschläger is a researcher at the Austrian Centre for Digital Humanities of the Austrian Academy of Sciences (ACDH-OEAW), where she is responsible for legal issues. She is involved in several projects in which she works on data modelling, digital editing, and in the outreach department. In addition, she is completing her PhD with the German department of the University of Vienna. Her research interests include legal frameworks of digital research, biography theory, archive theory, modern Austrian literature, and the contemporary developments of gender issues in society. For more information, please visit http://vanessahannesschlaeger.wordpress.com/ .

1The tei2016app (Austrian Centre for Digital Humanities) is a joint effort to publish the abstracts of last year’s TEI conference as TEI documents, leveraging the advantages which we believe come along with encoding texts according to the TEI guidelines. Additionally, this application can also be considered a “proof of concept,” as the tei2016app is based upon the blog “HowTo create your own digital edition web app” (Andorfer and Kampkaspar 2016), which was officially introduced to the TEI community with a poster at last year’s conference. This blog covers the main technical aspects of the application, which is based upon eXist-db 1 leveraging eXist-db’s built in application building framework. The main application logic is implemented in XQuery, the transformation of single XML/TEI documents is accomplished with XSLT and client side features are realized with javascript libraries. Having an existing framework at hand, the main challenges in building the tei2016app therefore were not so much technical ones. Unfortunately, this is not true for the data. After the abstracts had been handed in via ConfTool, 2 our colleagues at the Austrian Centre for Digital Humanities of the Austrian Academy of Sciences exported them, mostly manually (copy & paste), into a Word file, in which the texts were edited (uniformation of spelling and citation style, typo correction, etc). The index (including names, affiliations and page numbers of works by the authors) was also edited manually. For the production of the proofs for the printed book of abstracts, InDesign was used. The InDesign file was then exported to PDF format - this file was published as a full PDF online. 3
2After the conference, the InDesign file was exported to XML. The abstracts were split up into individual XML files and subsequently edited in oXygen 4 (partly automatically, partly manually) to become proper TEI files. These files were published on GitHub (Hannesschläger and Schopper 2016).
3In retrospective, we dare to ask the question if this workflow was the most effective or if a direct conversion from ConfTool to TEI (and from there to printable PDFs) would have been better. 5 Either way, there was no wokaround for some quite substantial manual data cleaning. Since there were no formatting requirements given in advance, the submitted abstracts took, aside from their length, quite heterogeneous forms. Here, a set of guidelines or templates provided by the conference committee beforehand would have simplified the editorial work later.
4A lot of work was also invested into gathering basic data about the authors of the abstracts. Although conference participants had to fill in quite a lengthy form for submitting an abstract, the data gathered is quite heterogeneous because ConfTool neither provides a closed list of vocabularies to choose from (e.g., keywords), nor asks for linkage to common norm data records (e.g., geonames.org, viaf.org, orcid.org). Another obstacle for efficient data extraction from ConfTool is the lack of (at least obvious and customizable) data export options.
5This concludes in quite some (manual) extra mileage, as entities had to be disambiguated and linked to the aforementioned norm data records in order to enable consistent linking of related entities (e.g., an abstract was written by a person, a person is related to an affiliated institution and an institution is located in a country) and derivatizing analyses from these links.
6A very useful feature would have been unique identifiers for literature referenced in the abstracts, as this would allow for interesting analyses about what papers were quoted by whom in the TEI community and how often.
7The tei2016app brought together already existing things and showcased what modern day open source and open access conference proceedings publishing could look like. However, it also demonstrated that this kind of publication is not only a matter of how to provide a highly standardized text as far as formatting and citation styles are concerned, but especially depends on the integration of norm data records to avoid the ambiguity of natural language. The latter could be enabled by improvements to tools such as ConfTool. If further steps in this direction are taken, we are confident that printed books of abstracts can be abandoned altogether; instead, publishing abstracts of future conferences as TEI files and in a web-app-framework such as the one we will show in this poster will enable the elimination of the humanities’ contribution to global warming. In addition, we provide a showcase not only for sustainability through reusability, but also for the benefits of early access, which in this case allowed us to eliminate serious security issues thanks to immediate reactions by the community. 6

Notes

  1. http://exist-db.org/exist/apps/homepage/index.html
  2. http://www.conftool.net/
  3. http://tei2016.acdh.oeaw.ac.at/sites/default/files/TEIconf2016_BookOfAbstracts.pdf
  4. https://www.oxygenxml.com/.
  5. A somehow more automatic transformation promisesthe DHConvalidator (https://github.com/mpetris/dhconvalidator) used e.g., for the last two DHd-conferences (2017, 2016) or the DH2016. Interestingly enough though, that to our knowledge, no TEI encoded versions of the abstracts have been published so far, only some PDFs).
  6. See the discussion “your eXist-db is an open proxy” initiated by Mathias Göbel on the TEI mailing list: https://listserv.brown.edu/archives/cgi-bin/wa?A1=ind1703&L=TEI-L#44.

Bibliography