State of the Comédie-Française Project
: Deniz Aydin
Minutes: 10
This project includes information in one chief repository, which can be found on our GitHub page. The JsonToXmlProsopography in generate_prosopography.py
class provides functionality to convert JSON data into an XML format,
specifically designed for managing comedian and author data in a structured TEI (Text Encoding Initiative) XML format.
This class is intended to be used to convert JSON files representing comedians and authors into a prosopography XML file located at output/prosopography.xml
from root when run. Similarly, there is a JsonToXmlPlays class in generate_plays.py
which converts JSON files representing authors and plays into an XML file of output/plays.xml
when run.
Methods in the JsonToXmlProsopography class:
parse_comedians_jsons()
:
template prosopography
(default:templates/template_prosopography.xml
): The path to the XML template that will be used to generate the final XML file.comedians_file
(default:json_exports/comédiens.json
): The path to the JSON file containing the data for comedians.authors_file
(default:json_exports/auteurs.json
): The path to the JSON file containing the data for authors.output_file
(default:output/prosopography.xml
): The path where the final XML output will be saved.
Functionality:
- Parsing the Template: the method begins by parsing the provided XML template (template).
- Loading Comedian Data: it opens the comedians_file, reads the data, and creates Comedian objects for each entry. Populating XML with Comedians: it creates <person> elements for each comedian in the template XML and adds relevant child elements (e.g., <idno>, <persName>, <occupation>, etc.) based on an individual comedian’s data.
- Loading Author Data: similarly, the method loads data from the
authors_file
and creates Author objects for each entry. - Populating XML with Authors: the method then creates <person> elements for each author and adds relevant child elements (e.g., <idno>, <persName>, <gender>, etc.).
- Generating the Output: the method applies indentation to prettify the XML output and saves the final XML document to the output_file.
create_comedian_code()
:
comedian_string (str):
A string representing the comedian’s last name or pseudonym.person (Element):
The >person< element to which the generated code will be applied.seen (list):
A list that keeps track of previously generated codes to avoid duplicates.
Functionality:
- The method processes the
comedian_string
by removing spaces, apostrophes, and commas, then converts it to lowercase. - It checks if the first four characters of the string have already been seen. If so, it appends a number to the string to create a unique ID.
- It sets the
id
attribute of the provided person element to the generated code.
Methods in the JsonToXmlPlays class:
parse_comedians_jsons()
- Description: This is the main method that coordinates the process of reading data from JSON files and converting it into the XML format. It loads the plays, authors, attributions, and roles data, processes it, and creates the corresponding XML structure with appropriate TEI tags.
Functionality:
- Parse the XML template: The method begins by loading a pre-existing XML template that defines the base structure for the output file.
- Read and process the JSON files: It reads and processes four separate JSON files:
pièces.json
(for play data)auteurs.json
(for author data)attributions.json
(for attributions of authors to plays)rôles.json
(for the roles of actors in plays) - Create XML elements: For each play, the method adds an <item> element to the XML, along with bibliographic information (title, genre, author, etc.). Each role related to a play is represented as a <castList> with a <castItem> for each role.
- Generate unique IDs: The method generates unique IDs for plays and cast members to avoid duplicates, using helper methods.
- Write the XML output: The generated XML structure is written to the specified output file.
create_title_code(title_string, item, seen_title)
- Description: This helper method generates a unique code for each play title. It cleans up the title string (removing spaces, punctuation, etc.) and ensures that titles with similar names have unique codes by appending numbers (e.g.,
PLAY1
,PLAY2
). title_string:
The title of the play as a string.item:
The XML <item> element associated with the play.seen_title:
A list of previously generated title codes, used to avoid duplicating codes.
Functionality:
- It processes the play title by cleaning up the string (removing spaces, apostrophes, and commas) and ensures consistent formatting.
- If the first 11 characters of the cleaned title string have already been used, the method appends a number to make the title code unique.
- The unique title code is assigned as the id attribute of the <item> element.
create_cast_member_code(title_code, comedian_string, cast_item, seen_name)
- This helper method generates a unique code for each cast member by combining the play’s title code and the cast member’s role name. This ensures that each cast member has a unique identifier within the context of a specific play.
title_code:
The unique code for the play.comedian_string:
The role name (or cast member’s name) to generate a code for.cast_item:
The XML <castItem> element associated with the cast member.seen_name:
A list of previously generated cast member codes to avoid duplicates.
Functionality:
- Similar to create_title_code, it processes the cast member’s name (removing spaces and special characters).
- If the cleaned-up name (first 4 characters) has already been used, the method appends a number to make the cast member’s code unique.
- The method generates the final cast member code by combining the play’s title code and the unique name code and returns it.