Creative Commons Attribution 4.0 International
No source, born digital.
TEI 2017 Conference Abstracts.
We decided to genderize the
The abstracts of the TEI Conference and Members’ Meeting 2016 were published by the
hosts (Austrian Centre for Digital Humanities / Austrian Academy of Sciences) as TEI
encoded XML documents on GitHub (Hannesschläger and Schopper 2016). This, and the fact that these documents were published under a CC-BY-SA-4.0 license, made it possible to take these data and
Among other things, the editors tagged the forenames of the authors with the according
As far as alignment of forenames and gender is concerned, this is a simple task, at least
from a technical point of view. As described on the tei2016app website in detail (Andorfer and Hannesschläger 2016)
looked for a comprehensive and structured list of forenames that have already been mapped
to genders, e.g., a list of female forenames and a list of male forenames. Secondly, the
tagged
The first list of gendered names we found was is the one provided by Mark Kantrowitz that is used e.g., in the NLTK package (Kantrowitz 2017). This list was ingested into a django-based web service and accessed by an XQuery script, iterating through all forename elements of the abstracts corpus, sending each forename to the service’s endpoint and storing the returned answer.
While simple from a technical viewpoint, from a gender studies viewpoint this approach was
questionable because Kantrowitz does not provide information on how the list was compiled
or what criteria were applied to group names into the categories
Other sources like e.g., genderize.io do not only provide more data, but also give information
about how the data was gathered and categorized. The most important argument for this
data source was genderize.io’s claim that the data collected there was assembled by scraping data from social network profiles, where people can declare their gender
themselves.
However, it has to be mentioned that we do not have full confidence in the truth of the claim that the data
was gathered from social networks because genderize.io only knows two genders, but platforms like Facebook already offer many more choices.
Solving the issue of finding an adequate data source led to the question of how to encode
this scraped information in a useful and TEI conformant way. The E.g.,
For the current project, we
As a result, we can now say that 87 forenames of the contributors of the TEI-conference
were male, 37 female and three