Sex in the TEI: The TEI 2016 gender check (paper)
Vanessa Hannesschläger* Vanessa Hannesschläger is a researcher at the Austrian Centre for Digital Humanities of the Austrian Academy of Sciences (ACDH-OEAW), where she is responsible for legal issues. She is involved in several projects in which she works on data modelling, digital editing, and in the outreach department. In addition, she is completing her PhD with the German department of the University of Vienna. Her research interests include legal frameworks of digital research, biography theory, archive theory, modern Austrian literature, and the contemporary developments of gender issues in society. For more information, please visit http://vanessahannesschlaeger.wordpress.com/ . and Peter Andorfer* Peter Andorfer studied history at Innsbruck University, where he finished a PhD in history with a thesis on the works of the Tyrolean peasant Leonhard Millinger (1753–1834). During an extended research period at the Herzog August Bibliothek in Wolfenbüttel (Lower Saxony, Germany), financed by a “Digital-Humanities Scholarship,” he published an online edition of Millinger’s main work The Depiction of the World. He has also worked on the topics “research data” and “scientific collections” in DARIAH-DE and maintains the webpage www.digital-archiv.at for developing and deploying different kinds of DH-projects.
1The abstracts of the TEI Conference and Members’ Meeting 2016 were published by the
hosts (Austrian Centre for Digital Humanities / Austrian Academy of Sciences) as TEI
encoded XML documents on GitHub (Hannesschläger and Schopper 2016). This, and the fact that these documents were published under a CC-BY-SA-4.0 license, made it possible to take these data and “play” with
them - for instance by building a web application to publish as well as analyse the data.
2Among other things, the editors tagged the forenames of the authors with the according
<forename>. This allowed us to ask the question about gender distribution among the
contributors to the conference. What started as a playful exercise in data mining, processing,
and analysis, lead to categorical questions about how to assign and especially how to
encode gender information to persons. We decided to genderize the <forename>s rather
than the <person>s and will explain this dencision during our talk with reference to
contemporary gender theory.
3As far as alignment of forenames and gender is concerned, this is a simple task, at least
from a technical point of view. As described on the tei2016app website in detail (Andorfer and Hannesschläger 2016)
looked for a comprehensive and structured list of forenames that have already been mapped
to genders, e.g., a list of female forenames and a list of male forenames. Secondly, the
tagged <forename> of the respective TEI abstract had to be checked against these lists.
4The first list of gendered names we found was is the one provided by Mark Kantrowitz that is
used e.g., in the NLTK package (Kantrowitz 2017).
This list was ingested into a django-based web service and
accessed by an XQuery script, iterating through all forename elements of the abstracts
corpus, sending each forename to the service’s endpoint and storing the returned answer.
5While simple from a technical viewpoint, from a gender studies viewpoint this approach was
questionable because Kantrowitz does not provide information on how the list was compiled
or what criteria were applied to group names into the categories female, male, and pet.
6Other sources like e.g., genderize.io do not only provide more data, but also give information
about how the data was gathered and categorized. The most important argument for this
data source was genderize.io’s claim that the data collected there was assembled by scraping data from social network profiles, where people can declare their gender
themselves.
1
7Solving the issue of finding an adequate data source led to the question of how to encode
this scraped information in a useful and TEI conformant way. The <sex> tag only allows to
encode assumptions about a person’s sex, and <gender> about morphological gender of a
lexical item, but neither of this fits our needs as we wanted to encode the gender a forename
is most commonly associated with. As it turned out, the broader issue of how to encode a
person’s sex has lead to quite some lengthy debates in the TEI community,2 non of which
consider the distinction between sex and gender (West and Zimmerman 1987)
or discuss the questionable praxis of
assigning either to a person other than oneself. The discussions focus on which values
should be used (allowed) to encode a person’s sex but do not consider the question on if
and how a forename element could/should be gendered.
8For the current project, we “solved” this issue by encoding the <forename>’s gender with the
help of a @type. Concerning the values of these attributes, we came across the same issues
that were discussed in context of <sex>, e.g., Should we encode gender information following
some (iso)standard or choose custom/arbitrary values? Finally, we decided to chose the
values “female”, “male”, and “nomatch” (the latter for forenames that did not match any name
gendered by genderize.io - and genderchecker.com, which was used to reconcile names not
found in genderize.io).
9As a result, we can now say that 87 forenames of the contributors of the TEI-conference
were male, 37 female and three “no-matches”. Concerning authorship of the published
papers, there were 39 abstracts with more male than female author’s forenames, in 14
abstracts more female than male, eight texts with an equal distribution and two abstracts
with an unclear result (meaning that most names couldn’t clearly be assigned a male or
female gender).
Notes
- However, it has to be mentioned that we do not have full confidence in the truth of the claim that the data was gathered from social networks because genderize.io only knows two genders, but platforms like Facebook already offer many more choices.
- E.g., https://github.com/TEIC/TEI/issues/426
Bibliography
- Andorfer, Peter, and Vanessa Hannesschläger. 2016. “Gender distribution among the contributors to TEI 2016.” tei2016app. https://tei2016app.acdh.oeaw.ac.at/pages/show.html?document=genderize.xml&directory=meta&stylesheet=meta.
- Hannesschläger, Vanessa, and Schopper, Daniel 2017. “Book of Abstracts in TEI XML.” TEI Conference and Members’ Meeting 2016. https://github.com/acdh-oeaw/TEI2016abstracts.
- Kantrowitz, Mark. 2017. “Name Corpus: List of Male, Female, and Pet names.” CMU Artificial Intelligence Repository. Last modified: 02-Apr-1997. http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/
- West, Candace, and Don H. Zimmerman.1987. “Doing Gender.” Gender and Society. 1(2): 125–151.