How to Stop Global Warming by Means of TEI, or: Saving the rainforest by building a digital book of abstracts (poster)
Peter Andorfer* Peter Andorfer studied history at Innsbruck University, where he finished a PhD in history with a thesis on the works of the Tyrolean peasant Leonhard Millinger (1753–1834). During an extended research period at the Herzog August Bibliothek in Wolfenbüttel (Lower Saxony, Germany), financed by a “Digital-Humanities Scholarship,” he published an online edition of Millinger’s main work The Depiction of the World. He has also worked on the topics “research data” and “scientific collections” in DARIAH-DE and maintains the webpage www.digital-archiv.at for developing and deploying different kinds of DH-projects. and Vanessa Hannesschläger* Vanessa Hannesschläger is a researcher at the Austrian Centre for Digital Humanities of the Austrian Academy of Sciences (ACDH-OEAW), where she is responsible for legal issues. She is involved in several projects in which she works on data modelling, digital editing, and in the outreach department. In addition, she is completing her PhD with the German department of the University of Vienna. Her research interests include legal frameworks of digital research, biography theory, archive theory, modern Austrian literature, and the contemporary developments of gender issues in society. For more information, please visit http://vanessahannesschlaeger.wordpress.com/ .
1The tei2016app (Austrian Centre for Digital Humanities) is a joint effort to publish the abstracts of last year’s TEI conference as TEI
documents, leveraging the advantages which we believe come along with encoding texts
according to the TEI guidelines. Additionally, this application can also be considered a “proof
of concept,” as the tei2016app is based upon the blog “HowTo create your own digital
edition web app” (Andorfer and Kampkaspar 2016), which was officially introduced to the TEI community with a poster at last
year’s conference. This blog covers the main technical aspects of the application, which is
based upon eXist-db
1 leveraging eXist-db’s built in application building framework. The main
application logic is implemented in XQuery, the transformation of single XML/TEI documents
is accomplished with XSLT and client side features are realized with javascript libraries.
Having an existing framework at hand, the main challenges in building the tei2016app
therefore were not so much technical ones. Unfortunately, this is not true for the data. After the abstracts had been handed in via ConfTool, 2 our colleagues at the Austrian Centre
for Digital Humanities of the Austrian Academy of Sciences exported them, mostly manually
(copy & paste), into a Word file, in which the texts were edited (uniformation of spelling and
citation style, typo correction, etc). The index (including names, affiliations and page
numbers of works by the authors) was also edited manually. For the production of the proofs
for the printed book of abstracts, InDesign was used. The InDesign file was then exported to
PDF format - this file was published as a full PDF online. 3
2After the conference, the InDesign file was exported to XML. The abstracts were split up into
individual XML files and subsequently edited in oXygen
4 (partly automatically, partly
manually) to become proper TEI files. These files were published on GitHub (Hannesschläger and Schopper 2016).
3In retrospective, we dare to ask the question if this workflow was the most effective or if a
direct conversion from ConfTool to TEI (and from there to printable PDFs) would have been
better. 5 Either way, there was no wokaround for some quite substantial manual data
cleaning. Since there were no formatting requirements given in advance, the submitted
abstracts took, aside from their length, quite heterogeneous forms. Here, a set of guidelines or templates provided by the conference committee beforehand would have simplified the
editorial work later.
4A lot of work was also invested into gathering basic data about the authors of the abstracts.
Although conference participants had to fill in quite a lengthy form for submitting an abstract,
the data gathered is quite heterogeneous because ConfTool neither provides a closed list of
vocabularies to choose from (e.g., keywords), nor asks for linkage to common norm data
records (e.g., geonames.org, viaf.org, orcid.org). Another obstacle for efficient data extraction
from ConfTool is the lack of (at least obvious and customizable) data export options.
5This concludes in quite some (manual) extra mileage, as entities had to be disambiguated
and linked to the aforementioned norm data records in order to enable consistent linking of
related entities (e.g., an abstract was written by a person, a person is related to an affiliated
institution and an institution is located in a country) and derivatizing analyses from these
links.
6A very useful feature would have been unique identifiers for literature referenced in the
abstracts, as this would allow for interesting analyses about what papers were quoted by
whom in the TEI community and how often.
7The tei2016app brought together already existing things and showcased what modern day
open source and open access conference proceedings publishing could look like. However,
it also demonstrated that this kind of publication is not only a matter of how to provide a
highly standardized text as far as formatting and citation styles are concerned, but especially
depends on the integration of norm data records to avoid the ambiguity of natural language.
The latter could be enabled by improvements to tools such as ConfTool. If further steps in
this direction are taken, we are confident that printed books of abstracts can be abandoned
altogether; instead, publishing abstracts of future conferences as TEI files and in a
web-app-framework such as the one we will show in this poster will enable the elimination of
the humanities’ contribution to global warming. In addition, we provide a showcase not only
for sustainability through reusability, but also for the benefits of early access, which in this
case allowed us to eliminate serious security issues thanks to immediate reactions by the
community. 6
Notes
- http://exist-db.org/exist/apps/homepage/index.html
- http://www.conftool.net/
- http://tei2016.acdh.oeaw.ac.at/sites/default/files/TEIconf2016_BookOfAbstracts.pdf
- https://www.oxygenxml.com/.
- A somehow more automatic transformation promisesthe DHConvalidator (https://github.com/mpetris/dhconvalidator) used e.g., for the last two DHd-conferences (2017, 2016) or the DH2016. Interestingly enough though, that to our knowledge, no TEI encoded versions of the abstracts have been published so far, only some PDFs).
- See the discussion “your eXist-db is an open proxy” initiated by Mathias Göbel on the TEI mailing list: https://listserv.brown.edu/archives/cgi-bin/wa?A1=ind1703&L=TEI-L#44.
Bibliography
- Andorfer, Peter, and Dario Kampkaspar. 2016. “HowTo create your own digital edition web app.” A Blog. TEI Abstracts 2016. https://tei2016app.acdh.oeaw.ac.at/pages/show.html?document=AndorferKampkaspar.xml&directory=editions&stylesheet=editions or https://tei2016app.acdh.oeaw.ac.at/data/editions/AndorferKampkaspar.xml.
- Austrian Centre for Digital Humanities. 2016. TEI Abstracts 2016. https://tei2016app.acdh.oeaw.ac.at/pages/index.html.
- ConfTool GmbH. 2017. ConfTool. Hamburg, Germany. http://www.conftool.net/.
- Exist Solutions. 2017. eXist-db. http://exist-db.org/exist/apps/homepage/index.html.
- Hannesschläger, Vanessa, and Schopper, Daniel 2017. “Book of Abstracts in TEI XML.” TEI Conference and Members’ Meeting 2016. https://github.com/acdh-oeaw/TEI2016abstracts.
- oXygen/. 2017. oXygen XML Editor. Romania: SyncRO Soft SRL. https://www.oxygenxml.com/.
- Resch, Claudia, Vanessa Hannesschläger, and Tanja Wissik. 2016. TEI Conference and Members’ Meeting 2016: Book of Abstracts. Vienna, Austria: Austrian Academy of Sciences. http://tei2016.acdh.oeaw.ac.at/sites/default/files/TEIconf2016_BookOfAbstracts.pdf.