Using Markup for Multivariate Analyses in the Prosopographical Study "Formation for the Public Sphere" Monica Langerth Zetterman monica.langerth@ped.uu.se Digital Literature, Uppsala University Introduction This paper aims to illustrate how markup might be applied for multiple purposes in research. [Note 1: Parts of the content in this paper is based on work in progress, a forthcoming article co-authored with Prof. Donald Broady, titled TEI markup as research tool in the prosopographic study Formation for the public sphere. In our collaborative work Prof. Broady answer for the sociological and historical content and the author of this paper for the markup, application and statistical analyses.] Here, the TEI/XML encoding scheme [Note 2: See and Sperberg-McQueen & Burnard ] was used as a research tool when producing a collective biography in a sociological prosopographical study on prominent Swedish female pioneers around the turn of the century 1900. [Note 3: See . The project is directed by Donald Broady and funded by the Bank of Sweden Tercentenary Foundation.] In this collective biography the markup is used for the exploration of biographical information. Although the markup provided was done for a special reason, namely to extract specific data in order to apply multivariate analyse methods, such as correspondence analysis [Note 4: l’Analyse des Données, introduced by Jean-Paul Benzecri, a geometer-statistician, in the 1960. The method is done by modelling data sets as clouds of points in multidimensional Euclidian spaces and then interpreting the data in the cloud of points (Lebart et al.). Cf. Bourdieu (1984) for applications and some explanations.] , it also provides means for presenting, filtering and indexing the material. Background The main purpose of the project Formation for the public sphere. A Collective Biography of Stockholm Women 1880--1920 is to investigate the social strategies of the first generations of women entering the public sphere in Sweden. This period was of crucial significance for women engaged in philanthropy, reform pedagogy, modern health care, literature and music. These women’s strategies, investments and careers differed from their male contemporaries and their contributions are not easily recognisable. In order to discern and interpret their contributions to the establishment of the modern welfare state institutions, a modern educational system and the modern cultural fields, methods from the French sociological tradition founded by Pierre Bourdieu have been used. A central endeavour is to collect information on the women’s social origin, social intercourse, their networks, educational trajectories and matrimonial status. Such information is here called assets or capital. In Bourdieu’s sense certain types of capital are acknowledged within certain social groups but not by everyone (Bourdieu, 1992). Each field that is sufficiently autonomous has its own rules for inclusion, exclusion and rewards, and specific species of capital. [Note 5: See Broady for a proposed definition on Bourdieuan prosopography. See also the study on the French academic field Homo Academicus Bourdieu 1984) for an example of Bourdieu’s prosopography.] By analysing the distribution of certain types of capital among the pioneer women we try to map the structure, the hierarchies and the polarities of domains like female culture, education and philanthropy. Since we favour the collection of data which is sociologically interpretable data it is important to collect information on names, dates and places, e.g. where and when and with whom she lived, where she worked, in order to trace the "meeting places" and networks. Hence a mandatory core set of data was, whenever possible, harvested to depict some of the most crucial assets: •Social origin: father’s and mother’s occupation, education, positions. Number of brothers and sisters. Woman’s and parents' place of birth and place of upbringing. •Educational capital: kind of basic and further education. Sojourns abroad. •Social capital: influential relatives, matrimonial status, number of children, housing, member of state commissions, foundations. •Economic capital: wealth, earthly goods and relations to patrons. •Political and religious capital: positions in political/religious organisations, standpoints in such matters. •Specific symbolic capital: assets being valued either within certain fields or domains or within women’s networks. Of course many of the biographical texts cover much more, but the main aim is trying to make the collection of these core data as comprehensive as possible for each woman. Modelling the Data There are two kinds of datasets of biographical accounts called capital descriptions. One of the sets consists of one hundred rather extensive texts written in running prose text by the researchers, aimed to be published in for example historical journals or biographical handbooks. The scholars have explored archival material such as letters, diaries or estate reports, as well as printed newspapers, journals or books — and of course existing biographies and autobiographies. The other set is more than 1200 condensed texts based on excerpted information. [Note 6: Provided that the copyright issues may be solved, there should in due time be a freely available digital version. Meanwhile the access is restricted to the researchers and for teaching purposes.] The excerpts have been transcribed from two volumes, one from 1914 and one from 1921 with biographic articles on prominent Swedish women. Both kinds of texts have been provided with markup according to the TEI guidelines with the additional TEI tag set "Names and Dates" to encode proper names, date periods and precise dates. [Note 7: cf. Sperberg-McQueen and Burnard, 2002, pp. 499-516 ] Similar to the much more extensive and ambitious Orlando project [Note 8: For information on the Orlando project, documenting "the scholarly history of women's writing in the British Isles." see . See also e.g. Grundy et al.] we produce texts and we apply descriptive model-driven and interpretative markup. Unlike the Orlando project, though, we have not developed a DTD for this project. In our content model each woman corresponds to a main division that contains subdivisions and further subdivisions. In principle each subdivision or sub-subdivision corresponds to one type of capital such as